decrufting URLs

Posted on January 12, 2004October 5, 2023 by Rick Hurst

been thinking about this before:-

http://diveintomark.org/archives/2003/08/15/slugs

The idea is to have URLs which make sense and do not contain file extensions. This is a nice user friendly way of doing things, and it also helps if you intend to change platforms technologies at a later date or maybe integrate more than one technology into a single site.

Some publishing systems such as the excellent Plone (Portal and CMS based on the Zope server content management framework) do this by default as web pages are actually objects rather than files sitting on the file system (well actually you can have both, but I won’t go into it here).

Most of my commercial work is Microsoft ASP based, and I dont have the luxury of using some of the methods described above, especially when I don’t have direct access to the server configuration (e.g. if the site is hosted on a shared host), so I have to experiment with the best way to do this.

On one ASP based CMS I built, I went to the laborious lengths of giving it the ability to create a new folder for every page to give the site the appearance of having a file extension free structure. It works fine but the code crunching needed to get the system to to create and maintain a replica structure via FTP on the live site is by no means ideal – there are pages of code needed to track changes and make sure the system tidies up after itself when a page is moved etc.

One other method I have experimented with is that of the custom error page. Before I describe it, it is a fairly ugly "hack" and makes server logs virtually useless, so custom logging and reporting is required if you go down this route. It also increases server load, and you need a host who will allow you to have a custom error page.

I won’t go into detail here but the basic principle is that all unknown URL requests are sent to a custom error page (404 error page), which is in the form of an ASP. The page contains server side code to read the query string (which will contain the requested URL) and redirect to the appropriate "real" page, either by looking it up in a database or by some type of consistent naming convention. As I say, ugly. This method also makes it difficult to use querystrings.

A similar technique is to use an extension such as isapi rewrite to read incoming URL’s and redirect as appropriate, using regular expressions to extract data from the incoming URL.