> Fine. Now, how the hell do I backup my 5TB of photos? :-( :-(
I use git-annex. It understands the concept of wanting multiple copies of things, and keeps track of what is where (eg. S3, Glacier, some remote rsync server, or which of my many external drives). When I want something, it gets it for me (eg. by telling me which external drive to plug in).
Then, all I need to keep backed up is my git repository itself, which is tiny. I use Tarsnap for this, which means that I can keep previous snapshots without issue.
> Hard disks are not supposed to sit on a shelf unplugged for extended periods.
This works fine for me, when combined with some other method. Redundancy is key. And "git-annex fsck" checks a drive's integrity for me.
"git-annex is not a backup system. It may be a useful component of an archival system, or a way to deliver files to a backup system. For a backup system that uses git and that git-annex supports storing data in, see bup."
since you said you're using it effectively as a backup, could you please clarify what they mean, and what you mean?
For context, I was originally answering "Fine. Now, how the hell do I backup my 5TB of photos?". My answer is that you don't need to, since photos don't usually change after they are taken. It is sufficient to simply archive them safely (and redundantly).
I'm not using git-annex as a backup system. I use Tarsnap and my own tool ddar for backups.
I am using git-annex to archive specific large file collections that don't ever change (eg. photos and videos). By storing these collections appropriately in multiple redundant locations, and by also backing up my git-annex repository (using the backup tools above), I have effectively "backed up" my photos. They're as safe as any backup system can make them.
I've come to the conclusion that Amazon Glacier is probably the first good answer since tape drives. 5TB is I think 50$ a month - if you use this data professionally, this is probably worth it. Otherwise maybe still tapes? What's the max. capacity on those these days?
It's a more expensive investment to get started, (as opposed to something like CrashPlan, Carbonite, etc.) but I definitely recommend having a Drobo product hooked up to your network, with regularly scheduled backups to it.
If you got one of the five drive boxes, you could dump 5 2TB drives into it, and likely have enough capacity to store all 5TB of photos, and be able to flip on the option to have two drives crash simultaneously without you losing any data. (you definitely lose space using them, as I have a 1TB and 2TB in mine right now and only have ~900GB available) If you're using a mac, it can even act as a Time Capsule so you can direct Time Machine to back up directly to it. (my wife and I do this, it tends to back up about once a day rather than every hour like Apple promises) Yes, I know it's expensive, but it's nice to know I have a local box (nothing in the cloud!) with all my (and my wife's) data backed up to it, where if a hard drive decides to go kaput, neither of us lose anything.
Getting a "Drobo product" (NAS) won't do much when your house burns down or gets robbed. To be safe, you need your data to be in multiple physical locations.
That's fair enough. However, I do believe that the common case for data loss is that someone's hard drive dies, or you drop a laptop, or your laptop gets stolen, etc. Multiple physical locations provides the best safety, but there is value in having a local backup for when the hard drive dies in your computer.
Also I'm not sure what (NAS) means, and I apologize if I sound like a salesman or something. Wasn't sure how to best describe a line of products made by a company without sounding like one. I was just trying to recommend a solution that I, personally, own and use to cover a pretty common case. And also feeds my slight paranoia of having a lot of my personal data and whatnot on someone else's servers.
As mentioned elsewhere in the thread, what about burning them to 50 triple or quadruple layer Blu-ray Discs (BD-R's) then store them in a fire proof safe?
If you archive disks like this, you need to run software like SpinRite on them ever so often to maintain them. I'd suggest that you do it every time you update your archive on the disks.
Print them twice and store them in different locations.
Or more realistically: batch-resize the photo's so that they are much smaller, and save that version of manageable size somewhere online as a last resort for when a disaster destroys all your current full-sized copies.
I've tried Backblaze, but the upload speed is only a single connection and VERY slow from my part of the world. If the uploading was done via multiple connections, it would be 10X faster. As it is, I still have about 230 days left before my existing data (and that's not even all of it) will be uploaded.
The speed varies with Crashplan as well, but I usually get around a couple of megabits per second at least.
I run it on my home NAS though so it's basically just set-and-forget, meaning that I don't have to remember to keep my Mac online. I just add my photos to the NAS share and it takes care of it from there.
Mirroring is not a backup.
Raid is not a backup.
Hard disks are not supposed to sit on a shelf unplugged for extended periods.
Fine. Now, how the hell do I backup my 5TB of photos? :-( :-(
Edit: Lots of fantastic information here https://news.ycombinator.com/item?id=7371725
Is there any service that can burn my terabytes of data onto multiple copies on "made in japan taiyo yuden" CDR? :-)