Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



An equivalent which provides sources older than the past hour, day, week, month, year, decade, century, or millennium would be an opportunity here.


Google has that, it's just stopped working. You'll end up with articles from 1997 that say they updated 4h ago. And the amount of seo garbage that gets through has polluted the remaining results.

What would be really cool was if google could show what the results for a search looked like on a given day. Not the current algorithm, not any sites indexed since then, but what it looked like at the time. Going back to use 2010 Google would be a dream.


Part of me wonders whether this is technically possible. I’m sure it’s an answerable question.


There would be at least two ways to do it, neither of which are likely realistic to something google scale. One would be to cache results, so that any search searched before could be retrieved. This is limited in that you cant make new searches on old data, and possibly stores private information that shouldnt be accessible to others.

The other way would be to have the index and algorithm versioned, where you can target any instance of the algorithm against any version of the data.


Your second method is more in line with what I had in my mind as what you were getting at, and is pretty much the context of my original reply.

I am sure it’s technically possible going forward, but it would be interesting if such capabilities could be enabled for historical versions of the index and algorithm. Combined with anonymized historical zeitgeist data, some interesting digital archaeology could be attempted.

All the more reason to run your own crawler! What’s the state of the art for this area right now in self hosted solutions? Can you version your index and algorithm like we’re discussing and do these kinds of search-data time-traveling?


If you have an indicator of content on a specific date and can confirm no signoficant change ...

Though document fingerprinting is hard. Especially w/ fungible page elements.

Internet Archive has an angle here.


> Internet Archive has an angle here.

Are you referring to WARC type tooling or what? I don’t want to put words in your mouth. I’m a complete learner on this topic. I think gwern has written a bit about this broadly? I’m curious to know more about this, if you have time to share more.

https://www.gwern.net/Search

https://www.gwern.net/Archiving-URLs

https://en.wikipedia.org/wiki/Web_ARChive




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: