Category Archives: Work

301 Redirects: The Horror That Cannot Be Uncached

If you’re not a web developer, please ignore this post, as it won’t make any sense to you. :)

It never occurred to me that 301 on-domain redirects might be a bad idea for passing along browsers/search indexers to new content URLs, but here’s a use case where it becomes a big problem: when the person in charge changes their mind and wants to revert anything that you modified with a 301. I felt like an idiot for getting stuck in this technical snafu, and the workaround feels horrible, so let me explain.

Let’s take a simple example. Say you’ve got a page with information about widgets, and your task is to redesign the widget page. You see that it’s called content_123.html. Obviously, this isn’t a good URL, so you would like to guide clients correctly to a new URL, like “/products/widgets”. You heard that 301 redirects pass along PageRank after a little delay, and it’s better than serving a 404, right?

Well, here’s where it gets hairy. A few weeks later, the person in charge changes their mind and instructs you to revert the site to its original form. Easy, right? Just remove the 301 redirect from .htaccess, restore the old files, and you should be good to go, right?

Wrong. 301 redirects are cached for certain browsers, such as Firefox 3.5+ and Chrome. That means that a large set of users that visited your site, along with search engines, cached that mapping of content_123.html to “/products/widgets”. Even though your webserver is no longer instructing clients about any 301 redirect, any browser (and likely all search engines) have now saved your 301 redirect. You can’t create the reverse mapping either, because browsers and search engines are usually smart enough to avoid infinite loops, so they’ll just ignore the new instructions. God forbid you redirected anything to “/”, too.

If you’re lucky, this is an intranet app and you can instruct all users to manually clear their browser caches. If it’s public and this was a big redesign, your users may never even be able to get to a valid page if you just dump the old files on the webserver and they’re all redirected via 301s.

The big issue is this: there is no way to tell a browser to clear out or undo a 301 redirect. You have to wait for a browser user to clear the cache, or have to wait for it to expire. This is totally unacceptable from a user’s point of view, so here’s a super-ugly way to workaround the problem.

  1. Put legacy content back.
  2. Eliminate all 301 redirects from your .htaccess / mod_rewrite config. Might as well stop causing damage first.
  3. Rename legacy file (perhaps append something standard), like content-123-orig.html
  4. Create new mod_rewrite rules to do 302 redirects from the original legacy URL to the new renamed URL. This will redirect all existing links from the legacy site to the old URLs, for any browsers without the cached 301 redirects, such as new visitors or users who clear caches.
  5. Create more mod_rewrite rules that do 302 redirects from the 301 redirect targets (the “new” urls that are being moved away from). This will redirect clients that were using the new site, and also will serve the correct page for clients with a cached 301 redirect – for example, browser A cached the 301 redirect, and so when you type in /content-123.html in its address bar, it instead tries to load “/products/widgets.html”. Because of the new 302 rule, it will report that “/products/widgets.html” has been moved temporarily to “/content-123-orig.html” and the user will load the legacy page contents.

This is obviously a really horrible workaround, but changing your mind is something that normal humans do in the real world, and irreversible changes deserve more attention. It’s embarrassing to post a “solution” like this, but if you run into this problem I’d rather save you some time than save me some face.

If you’re considering going with 301 redirects in a move to a new URL structure, be aware that you’re moving down a one-way street. It makes HTTP 301 seem awfully out of place in a spec that is otherwise quite comprehensive about downstream caching of content. Caching redirects seems like a logical behavior when the spec says that the change is permanent, and I suspect the main reason that this hasn’t caused more visible problems yet is that the “permanent” part has been ignored by browsers until recently.