What about N-grams frecuencies? 1-grams (aka characters) have too few information and are probably fine, using them you can only identify the language of the original work. With a few more you can identify the author and the book. I don't remember the exact number, but if you have the frecuencies of 10-grams you can probably reconstruct big chuncks of the book.