404 Notification Package
The 404 Notification Package was first created in house to help keep an eye on file not found errors caused by updates performed to the Internet Connection website.
As a website evolves, pages often get moved, removed or renamed. When these pages are requested, the server will not be able to find them so it sends back the HTTP response "404 - file not found". This is usually frustrating to visitors to your site and can also be detrimental to your site's search engine rankings as well. Afterall, if search engine spiders find that many pages in their databases are no longer there, those pages will be removed from the databases. The 404 Notification Package allows you to avoid these things by notifying you when a 404 error occurs.
By sending you an email with the URL to the document that was not found as well as the URL to the referring document (if there was one), the 404 script enables to you to take steps to remedy this. This will ensure that your visitors, as well as search engine spiders, have 404-free experience while on your site. Please note that this script does not actually fix broken links, it simply notifies you when a request is made that results in a 404 error.
The 404 script is completely customizable and can be configured to send email to any address you like. It can also can be configured to ignore user-specified documents (i.e. robots.txt, a text file that most search engine's look for when they spider your site.) which can help keep you from getting an excessive number of email messages from the script.
Here's an example of an email sent by the 404 Notification Package:
Setting up the 404 Notification Package
When you install the 404 notification script, there are 3 files created. First, the script itself is placed in your cgi-bin. Next the Apache configuration file, named .htaccess, is deposited in your web root (webshare) directory. Last is notfound.shtml, which is also dropped into your web root.
Modifying/Customizing the Package and Related Files
Although you're freely able to modify the 404 Notification Package, please note that in most cases customers will not need to modify the script at all. The script should work fine when it's installed. In fact, modification to the script can be avoided by simply making sure that you either have a catchall address specified in your netConsole, or an email account/alias set up for your webmaster@YOUR-DOMAIN address. However, if you would like to send the email notification directly to an address other than webmaster@YOUR-DOMAIN, you can easily modify the script using the directions below.
To modify the 404 Notification Package begin by downloading it to your local workstation via the ASCII transfer mode of your FTP client. 404.pl can be found in the cgi-bin of your account. To edit the file, we recommend that you use a simple text editor such as Windows Notepad, MacOS' SimpleText etc. The reason for this is that many "complex" word processing or HTML editors will alter the code that the script requires to function. If you're familiar with using vi, you can also use it through a shell account.
If you do edit this file on your workstation, please remember to specify that your FTP client transfers it in ASCII mode. Also remember to maintain the correct permissions (0755) on 404.pl. That said, here's the section of the script where all modifications will be made:
01 02 03 04 05 06
#!/bin/perl # set your email address (before the @) my $email = 'webmaster'; # don't include these urls my @exclude = qw(/wpad.dat /favicon.ico);
One of the situations in which you will need to modify the script is if you do not have a webmaster@YOUR-DOMAIN email address set up. As mentioned above, you can avoid having to modify the script for this reason by making sure that you either have a catchall address specified in your netConsole, or an email account/alias set up for your webmaster@YOUR-DOMAIN address. The reason you need the webmaster@YOUR-DOMAIN email address to work is that, by default, the script delivers all notices it.
If you prefer to have the 404 Notification Package directly deliver it's notifications to an address other than webasmter@YOUR-DOMAIN, you must download the script and open in your favorite text editor. Again, if you're comfortable with doing so, you may also use vi on the server itself. The actual line that you want to change is number 3, shown below.
my $email = 'webmaster';
Simply replace the "webmaster" with the account name you'd like the notification delivered to. For example, if you want the message sent to kim@YOUR-DOMAIN, replace "webmaster" with "kim".
Another thing that users can modify is the list of URLs/files to be ignored by the script. Line 6 contains this list. By default we've set the script to ignore requests for two files: "wpad.dat" and "favicon.ico". The wpad.dat file is requested by visitors using Internet Explorer 5 (and higher) and have the browser's Web Proxy Auto-Discovery feature enabled. Like wpad.dat, favicon.ico is also requested by Internet Explorer. A favicon, or "favorite icon" is something that (as of this writing) is only supported by IE. Using a favicon.ico, you can customize your "Favorites" icon in IE.
To add things to the list simply follow the convention that we've laid out. For example, a list configured to ignore only a file called "robots.txt" would look like this:
my @exclude = qw(/robots.txt);
As mentioned above, adding non-important files to this list can reduce the number of emails sent to you by the script. For example, due to machines infected with the Nimda virus, you will probably see 404 errors come from requests for two files found on Windows servers, root.exe and cmd.exe. If you specify for the 404 script to ignore request for these files, you won't be bothered by emails regarding them. With only these two files added to the script, line 6 would look like so:
my @exclude = qw(cmd.exe root.exe);
Notice that the leading slashes have been ommitted for these two items. This is because these files may be requested deep within a directory. Ommitting the leading slashes will ensure that even if these files are requested from within 2 or 20 directories, you will still not be bothered with a 404 notice.
The only other page that most users will want to modify is the notfound.shtml page. Again, this file is deposited in your webshare directory and, as you can see from the code below:
01 02 03 04 05 06 07 08 09 10 11 12
<html> <head> <title>404 - File Not Found</title> </head> <body bgcolor="#FFFFFF"> <!--#exec cgi="/cgi-bin/404.pl" --> <h1>We're Sorry...</h1>
the page is pretty plain when it's first installed. You can modify it as much as you'd like, adding images and making it match the rest of your site. As long as you keep line 6 in the HTML above intact, the 404 Notification Package will work correctly.
Things to Remember
Should the 404 Notification Package fail to work properly, we recommend that you delete and re-in stall the it from the control panel.
Note: the following usually only applies to advanced users. If you don't know what an Apache configuration file (.htaccess) is or don't have one, you can disregard this next section.
If you`ve already specified a custom 404 error webpage in your site's .htaccess file you will need to modify your custom 404 error page. Simply add the code that calls the 404 script, line 6 above.
Also, if your page is not already marked to be SSI/parsed by the server, please do so by changing the extension to .shtml. Note that if you're using another language besides HTML (PHP for instance), you'll want to use the appropriate command to execute an external script.
- IC Tech. Ref. Document: HTTP Response Codes
- IC Tech. Ref. Document: Server Side Includes (SSI)
- IC Tech. Ref. Document: Understanding File Permissions
- IC Tech. Ref. Document: Using the Common Gateway Interface
- IC Tech. Ref. Document: Using the File Transfer Protocol (FTP)
- IC Tech. Ref. Document: Your Account's Command Line Interface
- FOLDOC's definition of the Common Gateway Interface
- FOLDOC's definition of the Hypertext Transfer Protocol
- FOLDOC's definition of Server Side Includes
- Plenty of SSI Documentation courtesy of The CGI Resource
- NCSA's The Common Gateway Interface
- NCSA's Introduction to CGI
- NCSA's SSI Tutorial
- W3C's explanation of HTTP client-server communication
- W3C's explanation of status codes in HTTP
- W3C's Hypertext Transfer Protocol Overview