Google has that, it's just stopped working. You'll end up with articles from 1997 that say they updated 4h ago. And the amount of seo garbage that gets through has polluted the remaining results.
What would be really cool was if google could show what the results for a search looked like on a given day. Not the current algorithm, not any sites indexed since then, but what it looked like at the time. Going back to use 2010 Google would be a dream.
There would be at least two ways to do it, neither of which are likely realistic to something google scale. One would be to cache results, so that any search searched before could be retrieved. This is limited in that you cant make new searches on old data, and possibly stores private information that shouldnt be accessible to others.
The other way would be to have the index and algorithm versioned, where you can target any instance of the algorithm against any version of the data.
Your second method is more in line with what I had in my mind as what you were getting at, and is pretty much the context of my original reply.
I am sure it’s technically possible going forward, but it would be interesting if such capabilities could be enabled for historical versions of the index and algorithm. Combined with anonymized historical zeitgeist data, some interesting digital archaeology could be attempted.
All the more reason to run your own crawler! What’s the state of the art for this area right now in self hosted solutions? Can you version your index and algorithm like we’re discussing and do these kinds of search-data time-traveling?
Are you referring to WARC type tooling or what? I don’t want to put words in your mouth. I’m a complete learner on this topic. I think gwern has written a bit about this broadly? I’m curious to know more about this, if you have time to share more.