301 Redirects: The Horror That Cannot Be Uncached

If you’re not a web developer, please ignore this post, as it won’t make any sense to you. :)

It never occurred to me that 301 on-domain redirects might be a bad idea for passing along browsers/search indexers to new content URLs, but here’s a use case where it becomes a big problem: when the person in charge changes their mind and wants to revert anything that you modified with a 301. I felt like an idiot for getting stuck in this technical snafu, and the workaround feels horrible, so let me explain.

Let’s take a simple example. Say you’ve got a page with information about widgets, and your task is to redesign the widget page. You see that it’s called content_123.html. Obviously, this isn’t a good URL, so you would like to guide clients correctly to a new URL, like “/products/widgets”. You heard that 301 redirects pass along PageRank after a little delay, and it’s better than serving a 404, right?

Well, here’s where it gets hairy. A few weeks later, the person in charge changes their mind and instructs you to revert the site to its original form. Easy, right? Just remove the 301 redirect from .htaccess, restore the old files, and you should be good to go, right?

Wrong. 301 redirects are cached for certain browsers, such as Firefox 3.5+ and Chrome. That means that a large set of users that visited your site, along with search engines, cached that mapping of content_123.html to “/products/widgets”. Even though your webserver is no longer instructing clients about any 301 redirect, any browser (and likely all search engines) have now saved your 301 redirect. You can’t create the reverse mapping either, because browsers and search engines are usually smart enough to avoid infinite loops, so they’ll just ignore the new instructions. God forbid you redirected anything to “/”, too.

If you’re lucky, this is an intranet app and you can instruct all users to manually clear their browser caches. If it’s public and this was a big redesign, your users may never even be able to get to a valid page if you just dump the old files on the webserver and they’re all redirected via 301s.

The big issue is this: there is no way to tell a browser to clear out or undo a 301 redirect. You have to wait for a browser user to clear the cache, or have to wait for it to expire. This is totally unacceptable from a user’s point of view, so here’s a super-ugly way to workaround the problem.

  1. Put legacy content back.
  2. Eliminate all 301 redirects from your .htaccess / mod_rewrite config. Might as well stop causing damage first.
  3. Rename legacy file (perhaps append something standard), like content-123-orig.html
  4. Create new mod_rewrite rules to do 302 redirects from the original legacy URL to the new renamed URL. This will redirect all existing links from the legacy site to the old URLs, for any browsers without the cached 301 redirects, such as new visitors or users who clear caches.
  5. Create more mod_rewrite rules that do 302 redirects from the 301 redirect targets (the “new” urls that are being moved away from). This will redirect clients that were using the new site, and also will serve the correct page for clients with a cached 301 redirect – for example, browser A cached the 301 redirect, and so when you type in /content-123.html in its address bar, it instead tries to load “/products/widgets.html”. Because of the new 302 rule, it will report that “/products/widgets.html” has been moved temporarily to “/content-123-orig.html” and the user will load the legacy page contents.

This is obviously a really horrible workaround, but changing your mind is something that normal humans do in the real world, and irreversible changes deserve more attention. It’s embarrassing to post a “solution” like this, but if you run into this problem I’d rather save you some time than save me some face.

If you’re considering going with 301 redirects in a move to a new URL structure, be aware that you’re moving down a one-way street. It makes HTTP 301 seem awfully out of place in a spec that is otherwise quite comprehensive about downstream caching of content. Caching redirects seems like a logical behavior when the spec says that the change is permanent, and I suspect the main reason that this hasn’t caused more visible problems yet is that the “permanent” part has been ignored by browsers until recently.

12 thoughts on “301 Redirects: The Horror That Cannot Be Uncached

  1. I’m puzzled as to why you think caching of a permanent redirect is such a problem, surely this is no different to ANY content which has a TTL set?

    So if you change some HTML code (wrongly) and someone accessed that page, then surely you get the same issue?

    At the end of the day, if you don’t want browsers/caches to cache the permanent redirect, then surely you should just issue with a ‘no-cache’ header? What bugs me is the fact that we want to, potentially, cache redirects (by issuing the appropriate cache-control header) but most browsers just ignore it!

  2. I’m curious if this will do the trick- the solution mentioned by Francis. I’m not sure where one would configure the expiration of the redirects though? for IIS7.5 (in my case), ill need to poke around and see if there is a way.

  3. I am also considering to do 301 with no-cache headers. Have you finally made any tests of that with different browsers?

  4. The problem here is that browsers strongly cache 301 redirects. A simple Ctrl F5 doesn’t clear it. The only solution I’ve found is to delete your local cache folders for the browser that has cached it.

  5. Francis, there should be NO mechanism, purposeful or accidental that has the ability to remove control of an asset for all future developers of said asset.

    Interpreting “permanent” as “eternal” amongst man, in a temporary world, job, company, mindset, etc, is plain ignorant.

    Why would there be a mechanism to basically “give up all future control”?

    Makes no sense. 301 is basically now, a liability. Maybe a disgruntled employee wants to subtly undermine you. What is horribble, is that after 14 years of interpreting it one way, they change to a way that has clear detrimental effects, and there was no obvious warning put out? It was incorrect to just change their implementation after so long as act as if it’s nothing.

  6. I agree with you 100% Dominick. It’s just madness. Is there some w3c forum where enough webmasters could vote for something so that the steering committee could change the wording for the standard so that a more sane interpretation is made by the browsers?

  7. Cached 301 redirects are different from normal cached content. The user cannot force a reload of the redirect URL, it redirects away from that location immediately.

  8. mmmh… there is a very simple solution.

    Just do 301 redirect for Google bot plus a few more search engines. And for visitors do 302 redirect. Its easy to check HTTP User Agent and in case it is Google -> do 301

    example code:

    // redirect function function redirect($url){

    // for Google do 301 redirect if(strstr(strtolower($SERVER['HTTPUSER_AGENT']), “googlebot”)) { header(“HTTP/1.1 301 Moved Permanently”); } // complete the redirect and if not Google then just 302 header(“Location: $url”); header(“Connection: close”); }

  9. Requiring client intervention is very costly, in large-traffic environments.

    Giving any client a 301 redirect is unacceptable for the reasons outlined above – any responsible engineer must prepare for the eventuality that their work may be reverted, and a 301 cannot be reverted without client intervention. That includes Google, and who knows what their cache policy is! Good luck with that one.

    Clearing “your” browser cache will indeed delete the cached 301 redirect. But think twice before you do this – how visitors will give up before finding that solution? How many support requests will you need to deal with before this happens?

    Here’s my official suggestion to close out the comments – don’t use 301 redirects until your 302’s have been in place for a period of time that ensures the powers that be are happy with the URL structure changes. Or else you’re going to find yourself making some really nasty hacks.

Comments are closed.