Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A tool that could do parallel downloads of small files would be a winner. We have a lot of small files and use s3cmd. It downloads one at a time. You can finagle it with some xargs magic, but it would be nice to have it built in.


I wrote a simple tool for syncing an S3 bucket to local disk:

https://github.com/newspaperclub/hank

I found the existing s3 syncing tools used lots of memory when dealing with many small files, so I wrote this in Go, trying to carefully manage memory usage by concurrently listing and downloading the files.

We use it for backing up 500GB across 800k files, from an S3 bucket to a ZFS filesystem for snapshotting. It does the job well, usually taking just a few minutes when there are minimal changes.


Totally expecting to see someone build this in go in a couple weeks to do exactly that sort of thing. :)

Wish I had the time, I just realized it would be a fun project to learn Go with.


You could just modify s3cmd to use threads during uploads? Problem is IO bound so the Gil isn't an issue and you wouldn't have to go through the trouble of implementing s3's auth header.


I use this tool extensively and it's awesome: http://sprightlysoft.com/s3sync/ It has the same name. Check out -TransferThreads parameter for paralel uploads/downloads (see Documentation link). I use it for syncing dozens of buckets each containing tens of thousands of small files and it works without a glitch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: