Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Honestly, if I can't have the whole thing, I'm not going to bother mirroring a 1TB fragment that's worthless by itself to everybody except copyright attorneys.

As ndriscoll points out, the only feasible way to distribute an archive of this size is with physical hard drives. I sure wish they would find a reasonably-trustworthy way to offer that.



Most of the books are bloated PDFs. I'm slowly working on a project to reliably convert PDF to DjVu, which on average yields a highly readable document that's 33% of the original size on disk. The project is proving difficult, as the tooling for DjVu is quite moldy now, and often needs to be manually reviewed to ensure the file remains readable. Pdf2djvu exists, but it's highly unreliable, and thus can't be used in bulk. Other ebook formats are XML-based and tend to be similarly bloated due to the overhead of the markup. It's a hard problem with so little in the way of good file format choices.


That sounds like a pretty terrible idea, TBH. All of the best tooling is for PDFs, as you note, and storage will only get cheaper.

Ultimately that content is going to need to be represented as raw UTF-8 text and encoded images, so I don't see much upside to migrating it from one intermediate lossy file format to another.


You are never going to have a physical copy of the archive. It's nearly a petabyte in size.


I know several datahoarders that have at least 1PB, also archive.org grows by that much at least every day


I assumed that GP was an average person who doesn't have a storage array sitting at home. I'm not really sure why the IA is relevant here


1 PB of disk space would cost about $10K at this point in time. Not exactly unattainable. Looks like it would fit in a volume of space about the size of a standard refrigerator.

I'd be OK with both requirements.


It doesn't seem reasonable to me to suggest that an average person would spend $10,000+ (and the time to maintain it) on a pirate archive, hence my comment.

On the other hand, contributing a TB or two to a torrent swarm is much more feasible for most people.

In any case, if you're okay with that, you should do it. Please report back in 6 months with how it's going.


In any case, if you're okay with that, you should do it. Please report back in 6 months with how it's going.

Point being, if I tried to torrent the whole thing, it probably would take 6 months, and would likely get me booted from my ISP and/or sued. I would much rather buy a set of hard drives with the contents already loaded. Or tapes, as userbinator suggests.

(And as for the hypothetical "average person" you keep citing, I don't see anyone meeting that description around here.)


> I would much rather buy a set of hard drives with the contents already loaded. Or tapes, as userbinator suggests.

And my point is that this is an absurd suggestion. I shouldn't have to explain why a shadow library shouldn't be selling (tens of) thousands of dollars worth of hard drives containing pirated content. Beyond that, and what I was getting at earlier, is that maintaining a 1PB storage array at home isn't exactly easy, or cheap.


I shouldn't have to explain why a shadow library shouldn't be selling (tens of) thousands of dollars worth of hard drives containing pirated content.

Depends on what their goal is. I shouldn't have to explain why a "library" that's operating illegally in virtually every jurisdiction, with few or no complete mirrors, is vulnerable to being shut down by a small number of governmental or judicial entities.

If I were running the archive, not being a single point of interdiction would be high on my list of priorities. Especially when any number of people are indeed willing and able to keep 1 PB+ of content in circulation, samizdat-style. I would work to find these people, put them in touch with each other, and help them.

Beyond that, and what I was getting at earlier, is that maintaining a 1PB storage array at home isn't exactly easy, or cheap.

Not everything that's worth doing is easy or cheap, or otherwise suited to "average people." Again, I don't know where you're coming from here. What's your interest in the subject, exactly?


> It doesn't seem reasonable to me to suggest that an average person would spend $10,000+

You're right, and I was not trying to suggest that. I was merely disagreeing with "You are never going to" because I know there are people who are reading this who can and maybe will.


1PB is well beyond the point at which a tape drive and a bunch of tapes will be cheaper than hard drives, and likely more reliable.


For archival, yes. Not if you want to access the thing with any frequency.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: