Content negotiation

©2003-2005 by Michael Knudsen.
$Date: 2011/07/02 11:48:59 $

This or extracts hereof may under no circumstances be distributed in any form. This might change once the author is satisfied with content, layout, structure etc.

Intro

This is supposed to be a short introduction to setting up content negotiation in apache.

Problems with publishing web content

When publishing content on the web it is important to make sure that your pages are all up to date. This means that there are at least two issues that make it troublesome to maintain pages:

Change of publishing methods

File names often determine the language and publishing method for a page. This means that a change in publishing method will require you to change all links in order to suit the new structure. Though this can be done through scripting, it is a tiresome and error prone process.

Pages in several languages

If your website has pages in many different languages, you can use e.g. directory structure to separate documents by language, but users have seen so many different ways of using this method that they cannot remember how you have used the method. Besides, it is annoying to traverse the entire filesystem to make changes.

The Remedy

Content negotiation solves the structural issues of the aforementioned problems. To explain content negotiation in few words: Content negotiation makes it possible to have one URI to many documents while hiding the publishing method from the user.

Content negotiation works by letting the user's browser specify some criteria about what it will accept. A such criteria is content language. The result is that an URI of one of the following forms:

http://www.someobscurehostname.com/dk/article.html
http://www.someobscurehostname.com/dkarticle.html
http://www.someobscurehostname.dk/article.html
http://dk.someobscurehostname.com/article.html

will be on a form like this:

http://www.someobscurehostname.com/article

There are several advantages to this. The major advantageone is that the URI is shorter and thus easier to remember for users. Another advantage is that you do not need to worry whether you are linking to the English or Zimbabwian page. The user's browser tells the web server which it prefers. Besides, a certain piece of information is the same regardless of language and publication method and thus should be place in one position only.

The actual setup

Some of the work is done in the stock httpd.conf that comes with OpenBSD. Apache loads mod_negotiation by default, so you need to configure it for your particular situation:

Add MultiViews to the options

This tells apache to use content negotiation. It can be used on a per-directory basis but to use it on all your pages, simply use:

<Directory "/var/www/htdocs">
	Options Indexes MultiViews
	[...]
</Directory>

Add any missing language maps using AddLanguage

Only few languages are added in the stock config. Add more like this:

AddLanguage pl .po
AddLanguage he .he
AddLanguage sv .sv

Note that files can have any extension. It makes perfect sense to make pages in Polish not have the extension .pl to separate them from Perl scripts.

Rename your files to match the language maps

This maps your files to languages. It basically means that files containing the same page in different languages are put in the same directory but with an extra extension that maps it to a language:

$ cd /var/www/htdocs
$ mkdir articles
$ cp pl/article1.html articles/article1.html.po
$ cp dk/article1.html articles/article1.html.dk
$ cp sv/article1.html articles/article1.html.sv

In order to avoid dead links you can leave the old files in place, or you can set up redirection. I recommend using redirection for some time, but it can be much work to set up. This task will probably be much less effort-demanding if you do not change the directory structure, but it can be necessary.

If for a given paper you only have, say, an English version, and a user does not specify Enlish in the list of preferred languages, an error code (406) will be returned and a list of possible contents will be listed. If you do not want this behaviour, you must avoid giving the document a language extension, thus naming it file.htm or file.html.

Change your links

This is probably the most effort-demanding task if you have many pages, but it will pay off in the long run. The most important thing in this step is to remove all extensions in links. Not doing this can prevent the use of content negotiation. You must also remember to change the links to fit any changes in the filesystem structure (new or removed directories etc.).

These are examples of how to change your link references:

http://www.someobscurehostname.com/pl/article.html
dk/article.html
sv/articles/article1.html

Change them to (respectively):

http://www.someobscurehostname.com/article
article
articles/article1

Done

This ought to do the trick. In order to test it make some files with different content and language extensions, e.g.:

Try setting the language priority in your browser to either of English, Hebrew and French and check that you get the right one. Apache needs execute and read permission on a directory in order to make content negotiation work, because it needs to search the directory for files.