|
by Leo A. Notenboom |
If you use meaningful page names on your site, you'll often end up with URLs that are quite long. For example out on Ask Leo! you'll find an article entitled What's the difference between Windows Live Messenger, Windows Messenger, MSN Messenger and Windows Messenger Service?. If you look at the resulting URL, it's very long. So long, in fact, that I need to chop it down to show it to you here:
http://ask-leo.com/whats_the_difference_between_windows ... vice.html
More commonly you'll recognize this situation in email programs when you try to email someone a long link:
http://ask-leo.com/whats_the_difference_between_windows_live_messenger_
windows_messenger_msn_messenger_and_windows_messenger_service.html
As you see above, email programs see the "http://" on the first line and automatically treat the rest of it as a link. But since the line is too long to fit on the screen, they break it into two, and then don't include the second half in that link. If you click on the highlighted link you get an error from the destination site, since you didn't provide the full page URL.
And yet, if you click on that partial link above for Ask Leo!, you'll get to the intended page anyway.
Ask Leo! tolerates broken URLs. Here's how I do it.
The steps are simple:
I have a custom 404 error handling page written in PHP called "404resolver.php".
When a page is about to be reported as "not found", the web server displays this page instead, passing in the name of the original page that was requested.
404resolver.php examines the files in the directory of the web site, looking for ".html" files that begin with what was passed in as the original request:
If it finds none, it reports a "page not found" like any 404 handler would.
If it finds exactly one, it performs an immediate 301 redirect to that page.
If it finds more than one, it displays a list and asks the visitor to make a choice.
Let's look at those three scenarios in practice before we look at the code.
Page Not Found: http://ask-leo.com/whats_the_difference_between_windows_live_messenger_and_mttips: this page does not exist, and will get you a "page not found" error. Try it.
One match: http://ask-leo.com/whats_the_difference_between_windows_live_messenger_: this page doesn't exist, but there's only one page that begins with exactly that URL. Since there's only one possible match, you're redirected to it as if you had gone there to start with.
Many matches: http://ask-leo.com/whats_the_difference_between_: this page doesn't exist, and there are several pages that start with that URL fragment. Since we know there are options, but we can't know which one was intended, we present a list and let the visitor choose.
•
The Code
I'm only going to present snippets here. The full code is available below for download.
The real brains of the operation are in this code sequence:
$szRequestedPage = $_SERVER['REQUEST_URI'];
$cRequestedLength = strlen($szRequestedPage);
$dir = $_SERVER['DOCUMENT_ROOT'];
if ($dh = opendir($dir))
{
while (($file = readdir($dh)) !== false)
{ // build array of possible matches
$file = strtolower($file);
if (0 == strcmp (".html", substr($file, -5)))
{ // limit to .html files only
if (0 == strncmp (strtolower ($szRequestedPage), "/" . $file, $cRequestedLength))
{ // possible match
$rgMatchingPages[] = $file;
}
}
}
closedir($dh);
}
This code takes the originally requested page or "REQUEST_URI", and compares it against every file in the root of the website's document root on the server. Every page that begins with exact same string as was originally requested is added to a list.
When that code is done, we have a list of pages which might be the desired page.
Next we act based on how many potential matches were found.
if (0 >= count($rgMatchingPages))
{ // none found. Redirect to the "real" 404 page
header ("HTTP/1.1: 301 OK");
header ("Location: /notfound.html");
exit(0);
}
if (1 == count($rgMatchingPages))
{ // only one match. Redirect to it, we're done.
header ("HTTP/1.1: 301 OK");
header ("Location: /" . $rgMatchingPages[0]);
exit(0);
}
As you can see if there's no match found, we redirect to the "real" page not found page. And if there's only one possible match found, we simply redirect to that page.
If more than one potential match we're going to display something, so we start outputting actual HTML:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <link rel="stylesheet" href="/al.css" type="text/css" /> <meta http-equiv="Content-Language" content="en-us" /> ...
The HTML being output is really whatever is necessary to display a page of your own design. Naturally mine outputs the "look and feel" of Ask Leo!.
However, within that page is an important sequence of code:
<ul>
<?php
// NOTE: this PHP loop outputs the possible matches. Replace (or not) the
// HTML code within the "echo" statement as you see fit. Take care to use
// single quotes, or to precede each embedded double quote with a backslash.
//
sort ($rgMatchingPages);
reset ($rgMatchingPages);
foreach ($rgMatchingPages as $page)
{
echo "<li><a href=\"$page\">$page</a></li>\n";
}
?>
</ul>
This code outputs a bulleted list, in sorted order, of the possible matches to the pages requested. Each is output as a link for the visitor to make his or her choice and click on.
That's really all there is to it.
A couple of caveats, of course...
Even though this is MTTips.com, there's actually nothing MT-specific about this solution. Any site that uses static HTML pages and can run PHP can use this solution. Naturally the concepts can certainly be extended to cover other technologies. (I've heard of at least one WordPress-based solution as well.)
In that same vein, this solution assumes all pages appear in the root of your site. Additional logic can certainly be added to handle other site organizations. The basic idea here is simple: use the page name that the user tried to get to as some kind of clue for what they were actually looking for. If you can, give them that without delay.
The full 404resolver.php as used on Ask Leo! can be downloaded here.
Posted May 18, 2007
Site Map
Entire site Copyright © 2003-2010,
Puget Sound Software, LLC and Leo A. Notenboom
Terms, Conditions & Privacy