 | Articles & Tips
Dealing with Dynamic URLs - SEO
Note that this will be an complicated topic on Search Engine Optimization (SEO).
What are Dynamic URLs?
If you look carefully, a lot of sites now are dynamic, that is their site draws information from somewhere (usually a database) and outputs these data onto your browser for you. You can think of them as webpages created "on the fly". Unlike static page which the information is handcoded directly into the HTML code. The purpose of this is that you can create a lot of dynamic pages easily and consistently without too much tedious coding page after page. Just use a template, then output the different information on them when they are called. Take for an example this site, you would realize that the pages actually are displayed with the same layout (hence the template) but with just different information (data). They can save webmasters a lot of time and effort and prevent mistakes as well.
And so, a site with a lot of pages and content will usually be dynamic, because it would be too tedious for the webmaster to manually edit and recreate the pages and the internal links.They are usually done with server-side technologies like PHP or ASP along with a database like MSSQL or MySQL powering them.
Now we know what is a dynamic site. But how about dynamic URLs? Dynamic sites usually have dynamic URLs. Let me give you an example: http://thisdomain.com/index.php?area=product&browse=1&listing=2& . From this URL you can sort of see that it is pointing to some product area in some listing or something from a dynamic site. The weird characters (eg "?", "=", "&" and so on) are sometimes called STOP characters for obvious reasons. These are the characters which you should be worried about.
Why Search Engines don't like dynamic URLs?
Dynamic URLs can cause problems to Search Engine's indexing in regards to duplicate content. Search Engine spiders are not eager to index dynamic URLs because of this and it can be a big problem for Search Engines.
- A dynamic page may create more pages, sometimes trapping the search engine bots in a endless loop and worse, create tons and tons of pages to index which is practically useless to the Search Engine users.
- Search Engines like unique content, not duplicate or near duplicate content. Dynamic sites may create pages which are similar(or very similar) in content with just different URLs. It's like having one page with a lot of different URLs pointing to it. Think of the poor person who is searching for some information, only to find a lot of links that are pointing to the same page and hence the same content and information!
- Session ids("sid=" or "sessionids=") can cause problems with duplicate content as well. Session ids are placed in the URL usually to tell the webserver who the user is. Think of it as cookies in your URL and a lot of website use them. You must have been to some sites which require you to login to enter the site, some of which uses session ids. So a different session id each time (and thus a different URL) may point to the same page, and this creates duplicate content for the Search Engine to deal with. And some URLs with session ids expire and if their server is strict, leaving a dead link. And if Search Engine index this dead link, the next time someone clicks on this link(which is indexed in the Search Engines) it goes to a error page, which everyone(that is you, me and Search Engines) hates. Although Google DO index URLs with session ids, it is still best avoided.
Although major Search Engines do read and index them(even those with session ids in them), they are best avoided. Your purpose as a webmaster is to HELP Search Engines index your site, don't make it difficult for them by having Dynamic URLs. In the past, a lot of Search Engine disregard the URL after stop characters which resulted in truncated URLs, so you need get a lot of the same URL, for example for http://thisdomain.com/index.php?area=product&browse=1&listing=2& , it may be truncated to http://thisdomain.com/index.php. Think of the problem it would create for the Search Engines and you as a webmaster(the Search Engine only index one page!). Of course, that's in the past, but it doesn't mean that it is still not problematic to Search Engines and your rankings at all.
What can I do?
Basically, you need to change your dynamic URLs to static URLs or at least reduce the number of STOP characters in these dynamic URLs. Static URLs are like for example http://thisdomain.com/harwdare/tools/index.html , you can see they don't have the STOP characters in them and they do look very attractive to Search Engine crawlers.
- If your site is small, do you really need a dynamic site? If you have like 10 or 20 pages, do you really need a dynamic site? Yes, it may take a little more effort, but using a html static site can help Search Engine bots index your site faster and more efficiently. And best of all, you have full control on your layout, script and it is probably much easier to modify and loads faster as well.
- Use some URL rewrite script or feature. If your webserver is on Apache, they have this feature called mod rewrite which displays dynamic URLs as static looking ones and can be done in the .htaccess file. However, this requires a lot of experience and knowledge on apache and mod rewrite. Those on MS IIS server also have a choice. There is a Module for IIS which can rewrite dynamic URLs to static ones. This method is the most best and most effective way to get rid of dynamic URLs.
- Use robots.txt to block Search Engine bots from doing certain actions, like search and so on. These actions are usually created with a dynamic URL and can create duplicate content because they may point to a page which can be accessed by other URL. For example, you may have a link which randomly shows your visitors a page. But that page can also be accessed by another different link. Two different URLs, the same page equals duplicate content. So just block Search Engine spiders from following the random link. However, you must be careful not to block the bots from indexing important parts of your site when you do this. Yes, this method will not remove the dynamic URL problem, but it will - Minimize the number of STOP characters and variables in your URL if possible. Ask your programmer to use the less of these characters as possible when coding your site.
- Use cookies instead of session ids. Althought there are some users who may disable cookies, most won't. However, you have to ensure that your webpage allows requests made without cookies because search engines don't accept cookies. If your site requires the user to have a cookie, then it may not get indexed as they can't be accessed by search engines crawlers.
- Use of sitemaps. By placing a html or xml sitemap on your webpage helps Search Engines index your site more efficiently than other internal links (which are dynamic) in your site. Oh, make sure the link to your sitemap on your webpage is static by the way! You can also submit "Sitemaps" to Yahoo! and Google.
- Cloaking. Ask your programmer to write a code especially for Search Engines. So if a search engine bot requests a page instead of an actual human visitor, then they will be directed to another page with a static url. This way, you won't have to worry about session ids causing you and the Search Engine problems. Note that Search Engines do not like cloaking. But I believe that if your purpose is to prevent session ids problems and the cloaked page is the exactly the same as what your visitors see (except without the session ids in the URL), I believe Search Engines would not mind.
Another advantage of changing to static URLs is that you can add keywords into it. Like http://thisdomain.com/harwdare/tools/hammers.html. You can see that this URL is keywords full. They have "hardware", "tools" and "hammers" as the keywords. Some major Search Engine place relevance on keywords in URLs, all to your advantage.
If you would like to learn more on Search Engine Optimization, please visit Darksat's IT Security Forum.
|  |