Prevent Spider Sessions = TRUE
Continuing out discussion of SEF URLS, another common misconception is that URLs with SIDs (Session Identifications) are not SEF. I often hear people ask how to get rid of the SID in their URL so that search engine spiders will index them, or rank them higher.
The osCommerce application uses cookies to keep track of what customers have placed in their shopping cart. But some customers have their browsers configured for higher security, and by default, do not allow websites to place cookies on their PC. It is for these customers that the SID in the URL is created. For these customers, the SID is passed from URL to URL, rather then begin stored in a cookie, to keep their ‘session’ intact, and so that osC can remember what the custom has in their shopping cart.
The SID in the URL however, is not a methodology without flaws. As you may know, spiders do not allow cookies. Therefore, when they visit your site, they are assigned SIDs through the URL. One of the biggest issues with SIDs in the URL for spiders is that is causes the search engine spiders to go into an ‘Infinite Loop’.
Think of a search engine spider program as an iteration through an array. First, the spider will crawl through the webpage, looking for any URLs it can find. It adds all of the URLs if finds to an array. Then it iterates through the array, visiting each URL one at a time. After visiting all URLs,the seach engin will usually disperce for an indeterminate amount of time, and return, going back through the site again, looking for any new URLs it might have missed the first time around, and adding them to the array. This is where SIDs in the URL trip the search engines up. On the second visit, the spider is assigned a new SID, which is interpreted by the spider as a new URL and therefore added to the array. Since this will happen again and again each time the spider visits your website, the spider never gets to finish iterating through the array. I’ve seen firsthand spiders like ‘Googlebot’ take up uncountable gigabytes of bandwith being stuck in this endless loop.
Another issue with spider getting assigned SIDs through URLs is that indexed sessions can be ‘hi-jacked’. For example, a spider crawls your website, gets assigned SIDs in the URL, and these URLs with SIDs in them actually make it to the search engines index. A Customer finds your listing in the search engine, and clicks on the link with the SID in it. The customer likes your store, and decides to purchase. Then another customer finds your listing in the search engine, and clicks on the link. Because the link has the same SID that the first customer used, osC gets confused, and thinks that the second customer is the first customer, and sometimes can even display sensitive information from the first customer.
So, how is the problem solved? One way to do it would be to enable the ‘force cookie usage’ in the osC admin, and not allow customers to checkout if they do not have cookies enabled. However, not being a very strong advocate of turning my back on any potential customers, in July 2002, I suggested we use a script that looked at the user agent of the visitor as the determinant for whether the SID is added to the URL. ( http://forums.oscommerce.com/index.php?showtopic=31928&hl=security ) This suggestion was adopted nearly unchanged into the core code of osC with the release of osC 2.2 MS2, and is toggled in the admin section with the ‘Allow Search Engine Sessions’ configuration option.
To this day, a better method of preventing spiders from having SIDs assigned through URLs has not been realized, and I recommend that any new store have ‘Prevent Spider Sessions’ set to true, and ‘Force Cookie Usage’ set to false for maximum URL Search Engine Friendliness.
Recent comments
4 years 44 weeks ago