Skip to content

Post Therapy: backwards-compatible URLs in Apache

August 24, 2010

This post is an exercise in writing therapy, meant to help me get my head around a specific problem that I’m encountering. I understand that this is a problem well-solved by existing tools – I just need to internalize the process of using them. If you are looking for a tutorial on how to solve this problem, please look elsewhere.

On the other hand, if you are curious about this topic and want to see what I’m thinking, read the details after the jump…

So – what I have is a website which is being moved from a custom (IIS/ASP-based) CMS into Drupal. We are almost ready to launch; the only remaining roadblock is to deal with inbound links to the existing URLs. Users following these soon-to-be-outdated URLs need to end up on the correct page, rather than the site home page or (worse) a 404 error. A specific example is that the URL

http://www.example.com/building.asp?building=587

needs to be sent instead to the new URL

http://www.example.com/node/832

Notes

A few things to note about this rewriting:

  • The domain name itself does not need to be altered
  • The ID value of each page is changing, as part of the migration between CMS systems. The ID value does not, however, change consistently – so I can’t just grab the value for “building”, add 438, and get the Drupal Node ID.
  • Drupal has clean URLs enabled, allowing URLs in the pattern of “/node/832” rather than “/?q=node/832” – although the less-clean URLs are also supported.

Options

There are three general options which I can see to accomplish this task:

  1. Apache httpd’s mod_rewrite directive
  2. Custom PHP script
  3. Something in Drupal like the path_redirect module

I don’t believe that Apache’s mod_alias directive is a viable option, for two reasons. First, I’m parsing inbound URLs that have a querystring. Second, mod_rewrite directives are already in play here, and that will trump anything that mod_alias tries to do.

Workflow

Whichever option I go with, these are the steps which need to happen:

  1. Apply the re-write rules to user agents requesting any URL whose path starts with “/building.asp”.
  2. Grab the querystring for these requests, identifying the value for the “building” variable.
  3. Look up in the Drupal database the new ID value for the requested record.
  4. Rewrite the URL request to be “/node/{new ID value}”

Failing this, an alternative might be to end this effort with a URL in the pattern “/?q=node/{new ID value}”. One potential advantage here is that the rewrite stays within the querystring, rather than being transferred to a requested file path. I’m not sure how useful that is, however.

A third option, I suppose, would be to set up a View in Drupal, that exists at a URL “/building.asp”. This view would have an exposed filter on the old ID value, which is part of the new building record content type. This approach has the advantage of being done entirely in Drupal, and a quick test I did earlier today indicates that I can get almost 100% of the way to a working solution – but I don’t know enough PHP scripting to automatically forward the user to the ultimate building record. Instead, I can have the view look up the new building record and put in a link, saying essentially “click this link to get where you want to go.”

Where I Am Now

At this stage, I’m poring through Apache’s documentation for mod_rewrite, trying to get a handle on the syntax for mod_rewrite. Assuming I get that down, I’m then going to need to figure out how to either query the underlying MySQL database from within the httpd.conf file – or I end up writing a very ugly regex or switch construct to list out all possible replacement values (about 725).

Any suggestions here would be welcome – when I get a working solution, I’ll put up a separate post illustrating what I did and how.

UPDATE: I ended up going with the Drupal module path_redirect – more information here

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: