Daily Archive for January 14th, 2009

Expected Behavior

The HTTP is an old protocol that harkens back many years now (RFC 2616 dates from 1999 and the earlier RFC 2068 from 1997, with HTTP itself being used since around 1990s). It’s complex and subtle, so most web frameworks build layers of abstraction on top of it and many browsers and servers provide compatibility layers that can recover from common errors. The end result is that many unexperienced web developers are not familiar enough with it to use it correctly, and most mistakes are not obvious when they appear only on the web browsers the developer doesn’t use.

What is the intent of the POST message, for instance? The basic idea is that the user posts some data to a web site (for instance, a message on a forum), so the expected response is a description of the data that has been sent and the result of the sending. So, refreshing the page will perform the POST again with the same data to get a new response.

Web sites moved away from this system and caused POST messages to respond with data that was not related to the posting. For instance, posting login/password information responds with the home page of the logged in user, and posting a message on a forum responds with the new state of the forum thread. The result is that the user mistakenly believes that the current page is the result of a GET query, and that refreshing the page will simply perform a no-op query again (this is incorrect: since the page is the result of a POST, the POST will be performed again). Web browsers introduced a warning message to compensate for the web developer’s misuse of POST messages.

Another common useful abuse of the protocol is redirecting. The HTTP 1.0 protocol defines two main redirect modes: permanent (301) and temporary (302). For instance, when I moved this blog from Joomla! to Wordpress, I set up a 301 redirect for every URL on the old blog to the corresponding article on the new one (using appropriate URL rewriting rules in my Apache .htaccess): the intent of a 301 redirect is to indicate that the resource has been permanently moved to another location. This tends to be ignored by browsers, but online cache systems and search engines will use it to bind the old and new URLs together. On the other hand, if there’s some administration work underway on a web site, but some important documents must remain visible, a 302 redirect can be used to point to a temporary website on another server with the required documents.

Then, to avoid the “post the same data twice”, the web developers started using redirects. The POST message is received by the web server, which performs the modification (log in, add message to thread) then sends a 302 temporary redirect to the expected URL (the user home page, the thread page). For the user, the behavior on most browsers is that refreshing the destination page will only refresh the redirected part, not the original POST. Everyone wins.

Except the HTTP protocol specifies that in the case of a POST request, the redirection should not be done automatically (that is, the expected behavior would be for the browser to say “Hey, you just sent a POST message, and you’re being redirected to another URL, do you want to follow it?”). Oops.

The good news is that HTTP 1.1 added a new status code for redirects: 303 is a “See Other” redirect which means, mostly, that the request was successful, but there’s no relevant data to be displayed here so the browser should instead send a GET request to another URL. This is what a website should use to redirect a user to a display page after a POST request is received.

The bad news is that PHP developers tend to redirect using the header(”Location: “.NEW_URL) trick, which sets the status code to 302 by default. So, when you’re redirecting after an HTTP POST was received, don’t forget to also use header(”HTTP/1.1 303 See Other”) as well—after you’ve checked, of course, that the request allows the use of the HTTP/1.1 version instead of just HTTP/1.0!



693 feed subscribers
(readers who polled a feed this week)