James Gardner: Home > Blog > 2007 > Best Practice for Good URL Structures

Best Practice for Good URL Structures

Posted:2007-11-11 13:20
Tags:Web

I've had an instinctive feel for a long time that some URL structures are better than others and whilst URLs which map to the structure of the code on the filesystem are clearly bad I thought it would be interesting to think about exactly what makes a good URL structure and what doesn't, both from a usability and technical point of view. My motivation for this today rather than at any other time is that I'm writing a chapter on Routes for the Pylons Book so I thought it would make useful background reading.

First a definition of the parts of a URL:

http://jimmyg.org:80/some/url#fragment?foo=bar
|--|   |---------|--|--------|--------|------|
 |          |     |     |        |       |
protocol    |    port   |     fragment   |
       domain name  path info      query string
  1. Describe the content

An obvious URL is a great URL. If a user can glance at a link in an email and know what it contains you have done your job. This means choosing URL parts which accurately describe what is contained in each folder and always using a descriptive word rather than an ID in the URL. For example, if you were designing a blog you should try to use apr instead of 04 to represent April and you should use the name of a category rather than its ID. This makes your URLs more intuitive to your users and give search engines a better chance of understanding what the page is about.

You might think that direct use of URLs is likely to decrease as people use search engines and social bookmarking sites more frequently but research this year by Edward Cutrell and Zhiwei Guan from Microsoft Research where they conducted an eyetracking study of search engine use that found that people spend 24% of their gaze time looking at the URLs in the search results. If your URLs describe their content, uses can make a better guess about whether or not your content is what they are after.

  1. Keep it short

Try to keep your URLs as short as possible without breaking any of the other tips here. Short URLs are easier to type in or to copy and paste into documents and emails. If possible, keeping URLs to less than 80 characters is ideal so that users can pase URLs into email without having to use URL shortening tools like qurl.com or tinyurl.com.

  1. Hyphens separate best

It is best to use single words in each part of a URL but if you have to use multiple words, for example for the title of a blog post, then hyphens are the best characters to use to separate the words. e.g. /2007/nov/my-blog-post-title/. Unfortunately the - character cannot be used in Python keywords so if you intend to use the URL fragments as Python controller names or actions you might want to convert them to _ characters first. Incidentally using hyphens to separate words is also the most readable way of separating terms in CSS styles.

  1. Static-looking URLs are best

Regardless of how your content is actually generated it is worth structuring URLs so that they don't contain lots of &, = and ? characters which most visitors won't properly understand. If you can write a URL like ?type=food&category=apple as /food/apple then users can see much more quickly what is about.

  1. Keeping URLs lowercase makes your life easier

The protocol and domain name parts of a URL can technically be entered in any case but the part after the # is case sensitive. How a particular server treats anything between the two depends on the server, operating system and what the URL resolves to. UNIX is case-sensitive, while Windows isn't so if the URL resolves to a file, Windows servers will generally allow any case whilst UNIX ones won't. Query string parameters are also case sensitive. You can generally save yourself a headache by keeping everything lowercase and issuing a 404 for anything which isn't. Of course if you are writing a wiki where the page names depend on the capitalisation then you'll need to make the URLs case sensitive.

  1. Keep the underlying technology out of the URL

Your users don't care which specific technology you are using to generate your pages or whether it is a .html or .xhtml so the basic rule is don't use a file extension for dynamically generated pages unless you are doing something clever in your application internally like determining the format to represent the content based on the extension. It is also generally best to choose names which represent what the URL is rather than its technology so you might consider style and script to be better choices than rather than css and js for your CSS and JavaScript files.

  1. Use singular terms rather than plural

This is a matter of personal preference but rather than having a URL like /people/james use /person/james. It is likely that the last part of a URL will describe one thing, so the previous parts of the URL should describe that thing too. In this case james is a person, not a people so /person/james is more appropriate. You can use this convention throughout your application in naming controllers, database tables etc.

  1. Only use Disambiguated URLs

Any piece of content should have one and only one definitive URL, with any alternatives acting as a permanent redirect. In the past features like Apache's DirectoryIndex have meant that if you entered a URL which resolved to a folder, the default document for that folder would be served. This means that two URLs would exist for one resource (discussed by J Tauber here). To make matters worse servers are configured so that http://www.example.com/someresource and http://example.com/someresource both point to the same resource. This means there can easily be 4 URLs for the same resource.

There are three good reasons why this is bad:

  1. Never change a URL

Otherwise your users won't be able to find the page you bookmarked and any page rank you built up in social bookmarking sites or search engines will be lost. If you absolutely have to change a URL, ensure you set up a permanent 301 redirect to the new one so that your user's don't get 404 errors. The w3c put it best: Cool URLs don't change.

  1. Treat the URL as UI.

Navigation links, sidebars and tabs are all well and good but if you have a good URL structure your users should be able to navigate your site by changing parts of the URL. There are a few rules about how best to do this:

I hope that's useful. If you have any extra tips, feel free to leave them in the comments and if you have any extra evidence to support any of these tips I'd be interested to hear it. I think the most important tip though it that you should use common sense when designing a URL structure and don't apply any of the tips too rigidly, after all you know your application's and user's requirements better than me so you are better placed to make a judgment about what will work best for you.

Comments

  Best Practice for Good URL Structures by 3stuff

Posted:2007-11-13 17:18

[...] check the full story here [...] :URL: http://3stuff.cn/?p=1568

apple » Best Practice for Good URL Structures

Posted:2007-11-17 12:26

[...] Read the rest of this great post here [...] :URL: http://apple.wpbloggers.com/apple/?p=666

What is a URL?

Posted:2007-11-24 00:56

Aren't there types of redirects if a URL has to change? So why do people always say that URLs should not change? Just use a redirect. :URL: http://what-is-what.com/what_is/url.html

thejimmyg

Posted:2008-01-11 12:21

Keeping the page at the URL is best, using an HTTP redirect is fine (although not all search engines follow them), it is just that removing the page and leaving a 404 error page is not very good, even if that page links to a search engine or similar. :URL: http://jimmyg.org

(view source)

James Gardner: Home > Blog > 2007 > Best Practice for Good URL Structures