I have no association with this person and just heard about this. But obviously ...

chime · on Oct 11, 2011

> Autogenerated in the bad sense is link farms like fakesite.com/buy_drugwiththisname_now.html, with drugwiththisname replaced with 500,000 different possibilities

Actually there is no difference between search page result sites like the OP and what you said. Having a URL that ends in .html does not mean there is a static html file on the server. For examples, nextag and pricegrabber have URLs like blah.com/digital/Canon-EOS-7D-SLR-Digital-Body/m739295014.html. Scroll around these sites and you'll see they're anything but static.

Each page is simply a result of a query. Whether the database is local (like nextag) or remote (like Amazon API query like OP) is inconsequential. Personally, I would be VERY happy if Google hid/ignored these kind of search-query sites. I have not once found any of these sites useful. Problem is the large gray area in which these sites operate and what I find useless, someone else might find useful.

robryan · on Oct 11, 2011

Would be easy for the google bot to work this out to, replace the URL part that looks like a dynamic variable with different values and see if it returns a page for them all like a search engine would. Surprised it doesn't appear to do this.

StrawberryFrog · on Oct 11, 2011

> Actually there is no difference between search page result sites like the OP and what you said

Uh, yes there is. http://www.somesite.com/index.html?q=foo is obviously a search (right down to the choice if q for query), but http://www.somesite.com/buy_foo_now.html does not look like one.

You might say that technically they could have the same back-end, but the web is flexible - just about any URL scheme could resolve to the same back-end. The difference is that the first is honest and the second one goes out of its way to hide what it does.

ceejayoz · on Oct 11, 2011

> Uh, yes there is. http://www.somesite.com/index.html?q=foo is obviously a search

Obviously? Drupal's default non-rewritten URLs are index.php?q=foo, where foo is the page path. `q` can and does stand for more than one thing.

joshu · on Oct 11, 2011

Search results are also not supposed to be indexed. I believe this is actually in the guidelines.

Matt_Cutts · on Oct 11, 2011

Yup. Search for [quality guidelines] and they're at http://www.google.com/support/webmasters/bin/answer.py?answe... . The relevant part is "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."

j_col · on Oct 12, 2011

Matt, I tried that recently with one of my own sites, and after a few days I got a warning in my Google Webmaster Tools dashboard that the bot could not access my /search URL because it was blocked in my robots.txt, and I should take action to correct it. So I then unlocked my /search URL, which is in violation of the above guideline, but made the error in Google Webmaster Tools go away.

These conflicting messages from Google are very confusing!

vanessafox · on Oct 14, 2011

Hi j_col-

In that case, Google Webmaster tools is not actually reporting an error. That's a report to show you what URLs Google tried to crawl but couldn't (due to being blocked) so you can review it and ensure that you are not accidentally blocking URLs that you want to have indexed.

I agree that it's confusing in that the report is in the "crawl errors" section.

(I built Google webmaster tools so this confusion is entirely my fault; but I don't work at Google anymore so sadly I can't fix this.)

j_col · on Oct 17, 2011

Thanks for the response, I will block my /search URL once more via robots.txt and will ignore the warnings in the Webmaster Tools.

thenextcorner · on Oct 11, 2011

Literally Google Webmaster Guidelines say: "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."

So a couple of questions: - What is of value for a user? - Who is determining the value for the users in these cases?

As always, it's not always quite clear on what treatment you should use for search pages!

dlitz · on Oct 12, 2011

I'm pretty sure that having different search engines indexing each other would be bad. Don't cross the streams.

thenextcorner · on Oct 12, 2011

If I recall correctly, crossing the streams did kill marshmallow man!

In any case, I agree with you, search engine indexing search results would be bad, but the line is not that clear all the time!

Some vertical search engine result pages are a great and relevant result from a user perspective on the question they are trying to solve.

robryan · on Oct 11, 2011

It's hard to draw the line on what is search and what isn't. If you use a system where the last part of the URL will be treated as a search but only certain ones are ever linked to so they are a product page.

Also if you have some kind of recent searches list and those link.