Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here's the abstract for a nice review of search engine law: http://works.bepress.com/james_grimmelmann/13/ If I remember correctly, indexing a site that asks to not be indexed might be illegal as an illegal tresspass, but it is not settled law. The argument is that you are stealing resources (computer time) from the site owner.


>The argument is that you are stealing resources (computer time) from the site owner.

That's why 3taps is getting the data from google's cache without touching craigslist servers.


How the heck are they scraping Google without being banned or rate limited?


There are a lot of companies that are scraping Google quite successfully. Many of these are for 'rank checking' services that provide ranking data for certain keywords over time; these are heavily used by SEO and marketing agencies.

The two that jump to mind are Authority Labs and SEOmoz.

I guess: a shed load of proxies. :)


Amazon/ other clouds out there. Just auto provision your instances (lots of them), scrap, sleep, wake, scrap, sleep...


It's not impossible. You just tell google not to cache.


And makes it impossible to block them as a side effect.


IANAL but AFAIK it is only a civil matter (i.e. not illegal) since it is a usually prosecuted as a tort of trespass to chattels. For such a case to succeed the prosecution needs to show that the actions of the defendant deprived them of use of the good they were trespassing on. i.e. they need to cause enough of a burden on the servers that the claimant or their customers could not use the service.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: