> It's stated in the Wired article that "much of what Swartz is accused of d...

> It's stated in the Wired article that "much of what Swartz is accused of downloading from JSTOR is copyrighted".

I'll ignore this for the moment (or at least regard it as "dubious"), and I hope it's obvious why. If we do ignore it, let's see if it's possible for the other two citations you provide here take on a different character in its absence:

> It's stated in JSTOR's statement that "The downloaded content included more than 4 million articles, book reviews, and other content from our publisher partners' academic journals and other publications".

> The indictment states that downloads included "approximately 4.8 million articles, a major portion of the total archive in which JSTOR had invested. Of these, approximately 1.7 million were made available by independent publishers for purchase through JSTOR's Publisher Sales Service."

The claims about partners only means that there were publishers and other organizations supplying content to JSTOR (and were probably selling that content through their own, unrelated channels.) This does not preclude the content in question being entirely public domain content. Given the other characterizations in the indictment, I suspected it of being an instance of deliberately giving an impression of something that doesn't match what really happened while carefully skating on the edge of truthfulness. (I.e., "Yeah, partners were making it available for purchase, but, oh, did we fail to mention that it's public domain anyway?")

A closer reading, though, lends credence to still-in-copyright works being in the mix, if you treat "major portion" as being synonymous with "majority" (used correctly), and even more starkly and convincingly, in light of the numbers you provide.

Here are some more numbers provided by JSTOR: 7 million items online across 44 million pages as of February 2011 <http://about.jstor.org/about-us/jstor-numbers >.

[In case you can't tell, what I'm really saying is you've got me convinced.]

> However, it's hard to fit any explanation about his intentions to the facts as we know them without making major assumptions about information we don't have.

One such assumption you could make is that he was operating under the modus of grabbing what was possible and sorting it all out later. It doesn't seem to even qualify as a stretch. Given a similar line of thinking of mine in the past for a similar operation (involving databases operated by Gale Group), I certainly don't have any trouble maintaining this assumption for myself (and to myself). There is a detail that gives me pause, though...

> but then, why does a Harvard research fellow need to covertly access JSTOR from MIT

Damned good question.

> And not just access, but download two thirds of it?

Well, is there really any question here? I think it's safe to say that, were it not for a thorny interruption, Swartz was probably aiming for something closer to three thirds of it.