Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know all about the importance of Backups.

Mirroring is not a backup.

Raid is not a backup.

Hard disks are not supposed to sit on a shelf unplugged for extended periods.

Fine. Now, how the hell do I backup my 5TB of photos? :-( :-(

Edit: Lots of fantastic information here https://news.ycombinator.com/item?id=7371725

Is there any service that can burn my terabytes of data onto multiple copies on "made in japan taiyo yuden" CDR? :-)



> Fine. Now, how the hell do I backup my 5TB of photos? :-( :-(

I use git-annex. It understands the concept of wanting multiple copies of things, and keeps track of what is where (eg. S3, Glacier, some remote rsync server, or which of my many external drives). When I want something, it gets it for me (eg. by telling me which external drive to plug in).

Then, all I need to keep backed up is my git repository itself, which is tiny. I use Tarsnap for this, which means that I can keep previous snapshots without issue.

> Hard disks are not supposed to sit on a shelf unplugged for extended periods.

This works fine for me, when combined with some other method. Redundancy is key. And "git-annex fsck" checks a drive's integrity for me.


thanks for the new possibility! Very interesting. from their page: https://git-annex.branchable.com/not/

"git-annex is not a backup system. It may be a useful component of an archival system, or a way to deliver files to a backup system. For a backup system that uses git and that git-annex supports storing data in, see bup."

since you said you're using it effectively as a backup, could you please clarify what they mean, and what you mean?

thanks!


For context, I was originally answering "Fine. Now, how the hell do I backup my 5TB of photos?". My answer is that you don't need to, since photos don't usually change after they are taken. It is sufficient to simply archive them safely (and redundantly).

I'm not using git-annex as a backup system. I use Tarsnap and my own tool ddar for backups.

I am using git-annex to archive specific large file collections that don't ever change (eg. photos and videos). By storing these collections appropriately in multiple redundant locations, and by also backing up my git-annex repository (using the backup tools above), I have effectively "backed up" my photos. They're as safe as any backup system can make them.


I've come to the conclusion that Amazon Glacier is probably the first good answer since tape drives. 5TB is I think 50$ a month - if you use this data professionally, this is probably worth it. Otherwise maybe still tapes? What's the max. capacity on those these days?


It's a more expensive investment to get started, (as opposed to something like CrashPlan, Carbonite, etc.) but I definitely recommend having a Drobo product hooked up to your network, with regularly scheduled backups to it.

If you got one of the five drive boxes, you could dump 5 2TB drives into it, and likely have enough capacity to store all 5TB of photos, and be able to flip on the option to have two drives crash simultaneously without you losing any data. (you definitely lose space using them, as I have a 1TB and 2TB in mine right now and only have ~900GB available) If you're using a mac, it can even act as a Time Capsule so you can direct Time Machine to back up directly to it. (my wife and I do this, it tends to back up about once a day rather than every hour like Apple promises) Yes, I know it's expensive, but it's nice to know I have a local box (nothing in the cloud!) with all my (and my wife's) data backed up to it, where if a hard drive decides to go kaput, neither of us lose anything.


Getting a "Drobo product" (NAS) won't do much when your house burns down or gets robbed. To be safe, you need your data to be in multiple physical locations.


That's fair enough. However, I do believe that the common case for data loss is that someone's hard drive dies, or you drop a laptop, or your laptop gets stolen, etc. Multiple physical locations provides the best safety, but there is value in having a local backup for when the hard drive dies in your computer.

Also I'm not sure what (NAS) means, and I apologize if I sound like a salesman or something. Wasn't sure how to best describe a line of products made by a company without sounding like one. I was just trying to recommend a solution that I, personally, own and use to cover a pretty common case. And also feeds my slight paranoia of having a lot of my personal data and whatnot on someone else's servers.


Start by putting a copy on Glacier. If you're on osx you can use Arq [1] to make it insanely easy.

[1] www.haystacksoftware.com/arq/index.php


Copy them to 2x 3TB drives, put the drives in a safe deposit box or store them in a fire safe at a friend or relative's house.


Let me repeat: "Hard disks are not supposed to sit on a shelf unplugged for extended periods."

This even more the case with the very large capacity disks you are mentioning here.


As mentioned elsewhere in the thread, what about burning them to 50 triple or quadruple layer Blu-ray Discs (BD-R's) then store them in a fire proof safe?

http://en.wikipedia.org/wiki/Blu-ray_Disc_recordable


Every month go fetch the old hard drives, copy all the new data onto them, then take the currently in use hard drives and put those back into storage.

If you want more redundancy replace 2x 2x3TB drives with 2x NASes each with 3 drives in raid 5 or 4 drives in raid 1.


If you archive disks like this, you need to run software like SpinRite on them ever so often to maintain them. I'd suggest that you do it every time you update your archive on the disks.


[citation needed]

A lot depends on whether you're doing a backup or an archive; if it's a backup you'll be rotating the disks in periodically.


Print them twice and store them in different locations.

Or more realistically: batch-resize the photo's so that they are much smaller, and save that version of manageable size somewhere online as a last resort for when a disaster destroys all your current full-sized copies.


Wouldn't it be better to find ways that are not lossy?


I have been using CrashPlan for backing up my huge photo library, and it has been working great so far.

The main benefit compared to the others is the "unlimited" cloud backup space if you sign up for their cloud service.


I've tried Backblaze, but the upload speed is only a single connection and VERY slow from my part of the world. If the uploading was done via multiple connections, it would be 10X faster. As it is, I still have about 230 days left before my existing data (and that's not even all of it) will be uploaded.

I will check out crashplan.


The speed varies with Crashplan as well, but I usually get around a couple of megabits per second at least.

I run it on my home NAS though so it's basically just set-and-forget, meaning that I don't have to remember to keep my Mac online. I just add my photos to the NAS share and it takes care of it from there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: