Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the author just completly missed the point with API vs Screen scraping. The API allows for accessing structured data. Even if the website changes once, the datas would be accessible the same way through the API. Whereas, the author, would have to rewrite his code each time an update his made to the front-office code of the website.

A simple API providing simple json response with http basic auth is far more efficient than a screen scraping program where you have to parse the response using HTML / XML parsers.



This isn't always the case - APIs often change. Facebook, for example, is (at least was, a few years ago) notoriously bad at changing in an unpredictable and buggy way, and I stopped using it for that reason. Some HTML scrapers are more reliable than that.

As for efficiency, again not such an issue. HTML is very good these days, compared to 10 years ago, a simple CSS selector often does the job.


This is true, but APIs are often versionned.

Concerning efficiency this is true CSS / XPath processors, at least, both offer very nice performances.

But download 70KB of HTML each time you only need a single data, where the API request cost only a few (avg < 2KB), can be such a pain if you need to do this frequently. This can be handled by a scalable configuration but I find it a bit the overkill.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: