Most paid Usenet services these days advertise their retention time of > 1000 da...

jasode · on Feb 10, 2017

>Of course, there's no guarantee of integrity,

Exactly. It's been a decade since I've used Usenet but I remember there were always tons of messages complaining, "part43.rar and part62.rar is missing please reupload!!!"

As another anecdote, I personally posted a huge list of AWK one-liners 20 years ago to Usenet (comp.lang.awk) and I can't even retrieve that message. That's just pure ASCII text (probably less than 2k) and even the Google Groups (dejanews acquisition) archives don't have it.

sedachv · on Feb 10, 2017

> As another anecdote, I personally posted a huge list of AWK one-liners 20 years ago to Usenet (comp.lang.awk) and I can't even retrieve that message. That's just pure ASCII text (probably less than 2k) and even the Google Groups (dejanews acquisition) archives don't have it.

This is actually another in a long list of known search bugs on Google Groups: https://productforums.google.com/forum/#!topic/apps/kej8-gpV...

Broken search has been a constant problem since Google acquired Deja: https://motherboard.vice.com/en_us/article/google-a-search-c...

Google broke the Deja archive after acquiring them in 2001 and it has never worked properly since that time. The UI and the search sucks. The archives are still there though. I wish Google would release the archives to the public and set up a non-profit to manage them.

Compare this to Gmane, essentially a one-man project, where the community stepped up to bring it back on the web after only a couple of months: https://lars.ingebrigtsen.no/2016/09/06/gmane-alive/

problems · on Feb 10, 2017

If you posted something in the last 6-8 years or so it'll be readily available. Around that point most of the big commercial usenet providers just kept growing their storage without stopping. Drives got that cheap.

With something like backups though, it's expected you'll be regularly updating them, so the retention won't be a huge issue anyways as you'd probably prune old ones even if you were paying to store them.

therealmarv · on Feb 10, 2017

Maybe you have not read the article closely. You normally have a bunch of .par2 files (my knowledge is also 10 years old) which can repair your missing files/parts when the amount of loss is smaller than the size of all (non broken) .par2 files.

jasode · on Feb 10, 2017

My comment was more about the reliability of replication across geographies (author writes, "and it will be stored redundantly in news servers all over the world.")

The par2 files would be more surefire if you're using your own subscribed newsserver in a closed-loop fashion for backups. However par2 technology doesn't mean NNTP will guarantee those parity files actually got replicated to other servers.[1] If one is looking for safety beyond your own newsserver, you'd have to consider increased probabilities of failures. I think that's ok for sharing rips of Blu-Ray movies. However, for the irreplaceable backup photos of your children, hoping for par2 files on other Usenet servers you don't subscribe to may be too risky.

[1] http://ask.metafilter.com/237447/How-do-files-on-Usenet-deca...

quirkafleeg · on Feb 11, 2017

> You normally have a bunch of .par2 files

vol003+04.par2 and vol031+32.par2 is missing please reupload!!!

Before any flippancy haters downvote, this is exactly what happened on Usenet. Shitty servers are shitty, whatever the file type.

calvano915 · on Feb 13, 2017

par2 data could be uploaded but also kept on local backup or traditional cloud backup providers.

ashmud · on Feb 10, 2017

Parchives. Great for adding recovery to any file. :) I use them [now par2cmdline-tbb] religiously for archives. I started doing this years ago after having CD-R's burned on a drive (Plextor, no less) that I started having a hard time reading on any other drive.

loeg · on Feb 10, 2017

Yep. I use par2 for my local backups.

amelius · on Feb 10, 2017

> Exactly. It's been a decade since I've used Usenet but I remember there were always tons of messages complaining, "part43.rar and part62.rar is missing please reupload!!!"

Just use an error-correcting scheme, such as used by RAID.

loeg · on Feb 10, 2017

That's what the Par2 in the fine article is.

koolba · on Feb 10, 2017

Someone just posted a link to "Handy one-line scripts for awk":

https://news.ycombinator.com/item?id=13619124

Is that it?

jasode · on Feb 10, 2017

Thanks for the heads up but that's not the one. I posted it in 1998 or 1999 and I tried to find the exact Usenet archive link similar to the direct link for Larry Page's famous 1996 post on comp.lang.java[1].

To go back to the article, the author mentions posting the files to newsgroup "alt.binaries.backup". With Usenet, there isn't exactly a contractual SLA (Service Level Agreement) for that group. It's a gentlemen's agreement between those commercial Usenet providers (and non-commercials ones like universities) to replicate messages. Maybe because I posted the message to my GTE/Verizon ISP's Usenet server meant that it only got replicated to a few peers and it "died".

If my tiny text-based post which is 2 years newer than Larry Page's can't be recovered today, it doesn't give me lot of confidence to use Usenet as a backup solution. I have over 1 terabyte of photos, home videos, and tif scans of all my paper files. It's not appealing to chop that 1TB into a thousand PAR2 files with extra 20% redundant parity and posting it to alt.binaries.backup. That seems extremely fragile. Another commenter made a suggestion for Amazon's new personal "unlimited" cloud for $60/year. That seems much more reliable.

[1] https://groups.google.com/forum/#!msg/comp.lang.java/aSPAJO0...

u801e · on Feb 11, 2017

> It's not appealing to chop that 1TB into a thousand PAR2 files with extra 20% redundant parity and posting it to alt.binaries.backup.

For a 1 TB archive with 20% redundancy, you're looking at a block size of at least 32 MB in each par2 file (due to the maximum block count of 32767 [1] in the official implementation). Given that the article size limit for many news servers is roughly 1 MB, you're looking at even a single block getting split into 32 article posts. par2 programs typically will generate a series of files where the smallest files contain a single block and the largest files contain 50 or more blocks. The 50 block files will each get split into 1600 articles.

For par2 recovery to work even when articles are missing, you really want the recovery block size to be less than the article size limit, so that even if one or more articles are missing, the par2 archiver program can still read a subset of blocks from the incomplete recovery file and still use them for recovery. That means that the maximum archive size would be roughly 32 GB to keep the block size under the article size limit.

Going beyond that size means that it's less likely that the recovery file would be usable if some of the articles are missing. At 32 GB, if one article is missing from a 3 block recovery file, the software will still be able find 2 blocks in that file. But, if the archive size was 100 GB, then the block size would be a minimum of 3 MB and just missing 3 out of 9 articles that make up a 3 block recovery file would make the recovery file unusable.

[1] https://en.wikipedia.org/wiki/Parchive#Parity_Volume_Set_Spe...