I have come to the conclusion this will best be left to professionals, and a significant amount of oral history has in fact already been collected and curated, by professionals.
I wanted it to be me. The more I thought about it, the less equipped I felt. In some ways, it's the technologists curse, to believe your meta sense informs "the best way" when in fact, you're disrespecting another disciplines praxis assuming your amateur spider sense is the way.
Hey, original author here. I feel your comment. I should note that the goal here, as I drew it narrowly, was a technical one: to curate some very specific datasets (the "names and numbers and routes" that describe Internet infrastructure), generate some interpretative metrics that describe the network in various places at various times, and then to get the heck out of the way and let professional social scientists use this evidence to actually write papers and do history.
tl;dr: There's a lot of meaning locked up in historical Internet measurement datasets that is totally inaccessible to the real experts who study society. Technologists need to be respectful, preserve it, unlock it, and make it accessible outside the technical community.
I think Archive.org wayback machine and archive.today are going to be extremely valuable resources for "historical research" in the future. I hope they can continue their work via donor funding.
Are there any developments towards helping these efforts out by decentralizing / backing up parts of the archive and way back machine data?
I know torrents are hugely important in helping to decentralize and maintain many important files in academia and such (e.g. NN weights and CERN data) but I think cephfs is also trying to allow decentralized data storage with redundancy.
It seems like there is some solution which can provide a huge data source to be decentralized over arbitrary number of nodes, where each node can hold or back up just some part of the data, and allow for a dashboard view that shows the level of redundancy over all of the data for each of its parts.
I worry that archive.org or wayback will not have sufficient funding and will need to close, and the sheer impact of that is greater than most realize. I hope there is a decentralized archive project
I've been thinking a lot about this too. I was thinking people should start collecting written histories of their recollections of the internet, so future historians can understand the dynamics which shaped their future-present, like the "Great Digg Migration" and the "Tumblr Exodus", etc.
Even with written recollections and archives, it's going to be so difficult to follow. Things just change so quickly. The irony and memes that require you to understand 5 other memes are just going to be so difficult to capture in any meaningful way.
I also think people might not care that much. They'll have an even more sophisticated and oversaturated version of the internet and I'm thinking they'll only really care about a few big highlights from our time. Whatever is contained in the Wikipedia page for Reddit, Twitter, and Facebook will probably be enough for most future people.
Reminds me of this excerpt in Douglas Adams' "The Hitchhikers Guide to the Galaxy":
"Ford!"
Ford looked up from where he was sitting in a corner humming to
himself. He always found the actual travelling-through-space part
of space travel rather trying.
"Yeah?" he said.
"If you're a researcher on this book thing and you were on Earth,
you must have been gathering material on it."
"Well, I was able to extend the original entry a bit, yes."
"Let me see what it says in this edition then, I've got to see
it."
"Yeah OK." He passed it over again.
Arthur grabbed hold of it and tried to stop his hands shaking. He
pressed the entry for the relevant page. The screen flashed and
swirled and resolved into a page of print. Arthur stared at it.
"It doesn't have an entry!" he burst out.
Ford looked over his shoulder.
"Yes it does," he said, "down there, see at the bottom of the
screen, just under Eccentrica Gallumbits, the triple-breasted
whore of Eroticon 6."
Arthur followed Ford's finger, and saw where it was pointing. For
a moment it still didn't register, then his mind nearly blew up.
"What? Harmless? Is that all it's got to say? Harmless! One
word!"
Ford shrugged.
"Well, there are a hundred billion stars in the Galaxy, and only
a limited amount of space in the book's microprocessors," he
said, "and no one knew much about the Earth of course."
"Well for God's sake I hope you managed to rectify that a bit."
"Oh yes, well I managed to transmit a new entry off to the
editor. He had to trim it a bit, but it's still an improvement."
"And what does it say now?" asked Arthur.
"Mostly harmless," admitted Ford with a slightly embarrassed
cough.
This is true for most people regarding any part of history. But for any part of history there are those who take a deep interest and want to piece together the minutia of what happened. The limit case of that is an actual trained historian who specializes in that part of history.
Up to what resolution does this need to be recorded in history? Out of all those events, what would be if significance once the century gate opens in 2100? I suspect that general vibe-shift and dynamics are important, but all the constituent dramas may be just that — small personal dramas.
Internet history is one part of the history of technology and you have people already trying to preserve it. Look at all the vintage computer groups that talk about the history of computers and computing. Some even talk about the history of BBS systems. They are the groups already involved and would enjoy the support of others who are interested
I wanted it to be me. The more I thought about it, the less equipped I felt. In some ways, it's the technologists curse, to believe your meta sense informs "the best way" when in fact, you're disrespecting another disciplines praxis assuming your amateur spider sense is the way.