I have no association with this person and just heard about this. But obviously those are search term result pages and not "autogenerated pages" in the usual sense. If Google is going to apply such a standard uniformly, they'd have to ban all sites that have search functions.
Autogenerated in the bad sense is link farms like fakesite.com/buy_drugwiththisname_now.html, with drugwiththisname replaced with 500,000 different possibilities, and all generating pages linked to each other, which also have incoming links from ten billion forum comments across the web where a bot has signed up en masse and posted spam.
Now, perhaps this guy is posting spam to his site in places which makes it valid to declare this a link farm and kill it.
Looking at the site now, being able to search amazon by exact price is a pretty neat function and is totally different from a link farm.
I think killing his site for having a search function is pretty unreasonable.
However, you are a private for-profit company, so you can do as you please obviously.
> Autogenerated in the bad sense is link farms like fakesite.com/buy_drugwiththisname_now.html, with drugwiththisname replaced with 500,000 different possibilities
Actually there is no difference between search page result sites like the OP and what you said. Having a URL that ends in .html does not mean there is a static html file on the server. For examples, nextag and pricegrabber have URLs like blah.com/digital/Canon-EOS-7D-SLR-Digital-Body/m739295014.html. Scroll around these sites and you'll see they're anything but static.
Each page is simply a result of a query. Whether the database is local (like nextag) or remote (like Amazon API query like OP) is inconsequential. Personally, I would be VERY happy if Google hid/ignored these kind of search-query sites. I have not once found any of these sites useful. Problem is the large gray area in which these sites operate and what I find useless, someone else might find useful.
Would be easy for the google bot to work this out to, replace the URL part that looks like a dynamic variable with different values and see if it returns a page for them all like a search engine would. Surprised it doesn't appear to do this.
You might say that technically they could have the same back-end, but the web is flexible - just about any URL scheme could resolve to the same back-end. The difference is that the first is honest and the second one goes out of its way to hide what it does.
Yup. Search for [quality guidelines] and they're at http://www.google.com/support/webmasters/bin/answer.py?answe... . The relevant part is "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
Matt, I tried that recently with one of my own sites, and after a few days I got a warning in my Google Webmaster Tools dashboard that the bot could not access my /search URL because it was blocked in my robots.txt, and I should take action to correct it. So I then unlocked my /search URL, which is in violation of the above guideline, but made the error in Google Webmaster Tools go away.
These conflicting messages from Google are very confusing!
In that case, Google Webmaster tools is not actually reporting an error. That's a report to show you what URLs Google tried to crawl but couldn't (due to being blocked) so you can review it and ensure that you are not accidentally blocking URLs that you want to have indexed.
I agree that it's confusing in that the report is in the "crawl errors" section.
(I built Google webmaster tools so this confusion is entirely my fault; but I don't work at Google anymore so sadly I can't fix this.)
Literally Google Webmaster Guidelines say:
"Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
So a couple of questions:
- What is of value for a user?
- Who is determining the value for the users in these cases?
As always, it's not always quite clear on what treatment you should use for search pages!
It's hard to draw the line on what is search and what isn't. If you use a system where the last part of the URL will be treated as a search but only certain ones are ever linked to so they are a product page.
Also if you have some kind of recent searches list and those link.
Autogenerated in the bad sense is link farms like fakesite.com/buy_drugwiththisname_now.html, with drugwiththisname replaced with 500,000 different possibilities, and all generating pages linked to each other, which also have incoming links from ten billion forum comments across the web where a bot has signed up en masse and posted spam.
Now, perhaps this guy is posting spam to his site in places which makes it valid to declare this a link farm and kill it.
Looking at the site now, being able to search amazon by exact price is a pretty neat function and is totally different from a link farm.
I think killing his site for having a search function is pretty unreasonable.
However, you are a private for-profit company, so you can do as you please obviously.