The importance of a semantic URL
I’m constantly baffled why most companies and web developers don’t understand, or care about, the importance of using good semantic URLs. Therefore, I though I’d outline some reasons to help you understand why you really should care.
What is a semantic URL?
Semantic URLs, also known as Friendly URLs, are made up of logical parts, therefore showing the actual name of the specific web page you’re watching, while at the same time displaying where it belongs in the web site hierarchy. Let me give you some examples:
Bad URLs
- http://www.example.com/?id=547
- http://www.example.com/aspx?id=547&product=785
- http://www.travel-example.com/?continent=3&country=15&city=54
Good URLs, made from the samples above
The bad examples above can actually mean something like this in reality:
- http://www.example.com/contact
- http://www.example.com/products/screwdriver
- http://www.travel-example.com/europe/sweden/stockholm
See just how much better it gets? Another thing I dislike is when file extensions, like .php
or .aspx
, are part of the URL. What’s ridiculous about that, too, is that isn’t the content of the PHP or ASP.NET file that is presented, its the content that it generated that is shown to the end user. I’m all against the usage of any file extension in the URL, but the only one that would make sense at least some sense is .html
, because that is what is served to the web browser.
Why it’s important?
As you can see in the bad vs.good examples above, the proper ones convey meaning and structure, while the bad ones aren’t really useful to anyone (except for some web developers, but their target should really be the end user, not a digit that helps them find something in, say, the database).
Usability
I think that with a web site with a good structure and semantic URLs, you can guess the URL. Simple things like /contact
and /about
, but also more complex ones like http://www.tv-example.com/programs/lost
. As long as you know the domain name, it should pretty much be possible to navigate around solely through the web browser location bar.
Just take the travel web site URL as an example; companies dealing with lots of destinations but don’t have semantic URLs are far beyond me. Doesn’t the technical solution support it? Tough! Get a better one.
SEO
From a search engine perspective, naturally http://www.example.com/products/screwdriver
makes more sense than http://www.example.com/?id=547
, and they can index the web pages in conjunction with their content, to verify what it’s really about, and just how important it is in the web site’s context. For instance, http://www.example.com/products/hammer
will most likely be more important than http://www.example.com/products/expired-products/1975/disco-belt
.
If the URLs only would have been id-based, there would be now way of telling the difference between the top-seller and the most tiny unimportant one.
For the web developers
To go with the structure of the web site, the web development environment should be mirroring all parts. Therefore, the web developer can easily find the exact section that needs to be altered, leading to easier maintenance and logical implementations. Also, in the long run, you can easily change id’s and structure of the database without the need of worrying about users who have bookmarked certain web pages. As long as the semantic structure you front with is intact, you can have full control behind the scenes.
Last year I wrote an article "URL beauty with mod_rewrite". Guess someone will find it useful:
http://rmc.net.ru/article/eng/goodurl/
Great summary of the issue. It baffles me too. For years, I've often navigated (whenever possible) websites through the location bar alone…I feel very comfortable doing that in some cases and I've even known some non-techy users to do the same thing because they expect certain pages to be in certain places.
Some nice useful tip, cheers 🙂 Will put them into practise.
Another thing that is all too frequently absent from sites who *do* implement friendly URL:s is the implementation of something other than an error message for the "intermediary" levels…
Like in the example:
http://www.travel-example.com/europe/sweden/stock… http://www.travel-example.com/europe/sweden should IMHO *never* return an "unauthorized" or "listing not allowed", but instead present the visitor with some content relevant for the context implied by the URL…
I'd just like to ask you, Robert and your readers for an opinion as well experienced developers.
What do you think of the inclusion of encoded chars in URLs? Because my native language have characters that have to be encoded (ç = %E7) which destroy the whole point of clean urls.
Also, can't that approach cause you some trouble when trying to identify the object? the name of a product usually has spaces, hyphens, special chars… Are you willing to ask the user to insert another field (like "call_name" and have it clean for url usage?) or append some sort of id to the URL?
What I've been doing in my projects is sticking an ID with a cleaned "product" name. Like:
http://host.com/product/123,the_product_n…
Is this bad? I'm converting special chars, like: ç to c, á to a, etc.
Another usability issue i'd like to point out… We're very used to think of the folders in our websites for our own good, so we usually type "/users" or "/products" to stick all info on those items in those folders. But since what we want is for the user to be able to "guess" the url, it should be centered on him, the final user. So instead of users or products, we should use the singular. As if the URL is a command.
You can see this in action in last.fm
Try to guess your url there 😉
Sam,
That is a very useful article. For anyone who has been bewildered by Apache's URL Rewriting Guide, and just wanted straight-forward instructions on using mod-rewrite and .htaccess to clean-up file extensions from URLs–your article has the answer. Nice work.
There's alot of really good articles and discussions on proper URL design over at Well Designed URLs in case anyone's interested.
@Andre Luis:
I use the exact same method: http://www.example.com/item/12/cleanname, because I'm not all that comfy with relying on a string, since a string is so easily changed (by character encoding and numerous escape functions in your application).
This article on Sitepoint however, shows how you can link ids to strings in an external text file. That could be a solution to the both of us.
Thanks for your comments!
Sam,
Great article, thanks for sharing!
Jarvklo,
Definitely, I couldn't agree more!
André,
Very interesting input! It is a very tricky question when it comes to non-English languages and URL encoding. Personally, I think I'd go for the clean, English-only, characters in the URL, because otherwise I don't think it's usable.
However, at the same time you kind of discards logic, when the spelling is mutated into some non-existent language… Tough call.
Regarding spaces and other characters: to me, the best way is to replace them with hyphens, becomes the URL becomes more readable. Then, to expect the end user to understand that might be too much. But either hyphens, or no spaces or special character at all, just a-z and digits.
When it comes to singular vs. plural I think plural is just fine. I mean, having something like <code>/products/product-name</code> shows that the product in question is part of the products branch. Imagine a company with outdoor products and indoor products, it would help to distinguish where they belong in the greater scheme of things.
Dan,
Thanks for the link!
Harmen,
Thanks, good tip!
As I understand your logic behind this, it's important to note that not everybody uses apache so they might not have the luxury of being able to rewrite the URL. Sure, that'd be no problem with static pages (index.html in a directory) but for dynamic sites, this might be a problem.
Then we have further problems such as what if there's several pages with the same title ? And of course unicode, but that's already been mentioned.
Furthermore, I wouldn't reference query URLs as "bad" URLs.
i had the same thought as you andy and gave it some more time. for url's pointing to items in a very large database, like amazon, i don't think there is that good away around it, but often times web apps can be written to write .html files to a semantic uri. for example, a news site could write a political article with the headline "something political" as http://yourdomain.com/articles/politics/something….
granted this requires more coding to pull of a seamless site, but it may be worth it. i don't really know…
[…] ¤sentieren Vitaly und Sven in einer neuen inspirierenden Sammlung auf Smashing Magazine. Wie wichtig sind semantische URLs?. Diese Frage stellt sich Robert Nyman. […]
Very nice to see that more and more people get aware of the bad URI's out there and wants to do something with it.
Wrote something short, but similar last year on my site. You can check it out if you want: mod_rewrite made easy.
Keep up the good work 😉
Nice discussion.
I have read somewhere that some tests around the use of IRIs have been successfully completed by ICANN. Perhaps we might not have to wait too long for this to be possible and we can all start using IDNs.
I would research my post a bit more to give more accurate info with reference to URL semantics, but I have a day job and should really start!
Here are the wikipedia links to get anyone interested started:
http://en.wikipedia.org/wiki/Internationalized_Re… http://en.wikipedia.org/wiki/Internationalized_do…
Btw, notice how wikipedia follows good practices 🙂
Regards,
Stijn.
Andy,
When it comes to web servers, I've also included a link above how to do it for .NET with an IIS server. My belief is that it is seldom a shortcoming of the web server, but rather some of the other tools that cause the problems of not being able to create a semantic URL.
jnoody,
Good alternative.
Alexander,
Thanks for the link!
Stijn,
Thanks, good input!
Nice article, I have some questions, the .NET (IIS) workaround for this really seems like a hack, my question is what is in the log files a lot of 404s or does it handle it ok?
You can put semi Mod_Rewrite like statements in your web.config but it doesn't support regular expressions.
Is there a good free Mod_Rewrite equivalent available for IIS. I've heard Longhorn server will have this feature, but it's crazy to wait for a whole new OS just for a feature that Apache's had for years.
Dan,
I'm sorry, but I don't know more about how to accomplish it on IIS; not my core competence.
Google is not a standard maker. Why not use W3C Recomendations like RDFa??
While I agree in principle, I'd love to make it work. The truth is, it requires a lot of background to work. I still can't get rewrites and friendly links to work in my blog.
So, in view of constancy, what should I do?
Mateusz,
Interesting question. I guess one has to make a choice between Google, who is a de facto standard, and the W3C recommendation.
Will It Work,
I see that you use WordPress in your blog. Then you can choose > Permalinks in the admin interface to choose what type of semantic link structure you want to.
If http://www.example.com/?id=547 points to a page called "About Us" where 547 is the page_id, then how does the apache_mod_rewrite know that the url should be http://www.example.com/about-us/ ? It seems like you would need a database lookup to do the rewrite.
@Jason Martino
I have an extra column in (for example) my articles-table called url_str.
So instead of saying ?page=article&id=12 i go ?page=article&url_str=some-url-str and then rewrite /articles/some-url-str/ to ?page=article&url_str=some-url-str
Is there are better way to do it?
Mateusz and Robert,
RDFa is not (yet) a W3C recommendation, but is completely orthogonal to this discussion in any case.
Pretty URIs are nice, so is the ability to embed data in web pages; choose one, or both.
I totally agree with what you wrote. A question though in looking at your own link for this article. <code> ;http://www.robertnyman.com/2007/03/16/the-importance-of-a-semantic-url/ </code> is this semantic URL?? I do not want to seem like a wise bum but it would almost seem simpler to have it without the date…
btw just for the record I have the set up with the date with my blog ;P
But the date is semantic in a way. If you want to see which post is the most recent, then it is the best way. You get both a date stamp and the heading of the article. Not too bad, or?
I've skipped day on my blog but still use year and month.
Jason, Andreas,
Personally, I'm no expert in apache_mod_rewrite, but I hope the articles linked to in the blog or the comments can help you out, and show options.
Mike,
Thanks for clarifying that!
Jermayn,
No, no, that's good input. I think it makes sense for a blog (just as Kiper says), to immediately being able to see how old the post is. What's good about it, too, is that you can for instance take away the name of the post to get a listing of all posts that day, remove 16 to get all posts in March etc.
Hi, Great article and great links in the comments. But why is it calld semantic url? For me…
…for me, semantic is not for urls, for me semantic is the relation signs or elements has to an object.
Say web sematics, p for paragraphs or h for headers.
Urls? its just like stating this clause is semantic; "Im a citizen in world, europe, sweden."
Am I all wrong here?
Thats why i suggest we skip using the term "semantic url" and instead use: "usable url", "simple url", "friendly url" or maybe "proper url"
Or?…
Pär,
Thanks!
The term semantic has a far broader term than in just web developing. It is about conveying meaning. Please read more in Wikipedia's Semantics.
Another thing to have in your mind if you have a lot of languages at your site and don´t use different domains for each language. Don´t use just cookie based (or parameter based urls) pages for multi languages sites, each language need to have a unique url.
For example (bad):
http://www.example.com/products/?lang=en http://www.example.com/products/?lang=sv
Instead you can do like this:
http://www.example.com/en/products/ http://www.example.com/sv/products/
A little SEO tip 😉
Joakim,
Absolutely, good point!
Another great reason for using "plain english" URL's is when you speak a domain. I've worked with large companies that have large call centres. Try speaking a "bad" domain. It's even worse when people have trouble "I'll put down the phone as I need two hands to do a question mark" *phone back to ear* some more words, *phoe down* two hands for an equals sign, then an ampersand "what's that?". Bahhh. You've just wasted 10 minutes on the phone to convey a URL. Up goes the call queues, need more staff, costs more money. All on something basic that a free websever can do out of the box.
I guess if your using IIS then you can afford to waste peoples time with a complex URL.
Mountain/Ash,
Absolutely! 🙂
[…] The importance of a semantic URL – Robert’s talk (tags: usability url SEO semantic friendlyurl) […]
[…] a little sidetrack (please forgive me for this), but with my post about semantic URLs the other day close in mind, compare the URLs for IE’s add-ons to the Firefox and Opera […]
Going back to Stijn remark on wikipedia following this rule, what I love about wikipedia is if you type in: http://www.wikipedia.org/urls it will ask you do you mean http://www.wikipedia.org/wiki/urls, how would one go about implementing this into their website?
William,
I guess the road to go is definining some major keywords, and also misspellings, to re-direct them to a helpful page.
Nate,
My personal take is that a hyphen (-) is the best way to replace spaces. Without a separator, as you say, long names are bound to become illegible.
We recently created a wiki like application for posting articles and used the title of the post as the URL. It works technically, but get's a but strange when you have long titles. Another issue we've struggled with is spaces. Do we leave the spaces in the title, or should we replace them with _ or something? I can't give out the URL as it's not released yet, but I'd love to hear any thoughts you have.
An alternative to using mod_rewrite: Say you're coding in PHP, first use ForceType on a script with no extension, usually in a Directory context in your .conf file (I personally loath .htacces files for performance reasons) so mod_php picks it up. Next, parse everything to the right of the script, like so:
$args = explode('/',trim($_SERVER['PATH_INFO'],'/'));
Say products is your script and the URL looks like:
products/handtools/screwdrivers/phillips
The $args array will contain:
$args[0] = handtools
$args[1] = screwdrivers
$args[2] = phillips
Then you can build an SQL query (or whatever) to fetch the matching products.
Incidentally, the trim() in the parse line is there to remove any leading or trailing slashes, otherwise you'll end up with empty array items in your list.
[…] the GoogleBot crawling our site just fine, so we ignored it. After reading articles like “The Importance of a Semantic URL” we’ve decided to start the process of cleaning up our sites URLs. Instead of using […]
I don't think that people even look at URLs anymore. It's information that is only for the computer. So it is not so important to be semantic.
Except that it's a bad thing assuming how every person in the world, search engines do very much look at URL:s, so it is very important that they're semantic.
<blockquote cite="http://www.robertnyman.com/2007/03/16/the-importance-of-a-semantic-url/#comment-148565">I don’t think that people even look at URLs anymore. It’s information that is only for the computer. So it is not so important to be semantic.
What I find useful/frustrating is going back to a website that I visit from time to time – if the site has human-friendly URLs and I've been to the page before, I can pick it out of the auto-completing list – but if it's got random IDs, I have to go to the homepage and navigate through the site, which is annoying.
well, hi admin adn people nice forum indeed. how’s life? hope it’s introduce branch 😉
It just makes sense. We are implementing this into all Websites we create moving forward!
[…] more>> […]