Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I currently write software for an e-discovery company. Most tasks that our software is expected to be able to perform are simple-sounding tasks, at first glance, such as ...

1. extracting documents from within other documents (attachments out of an email, files out of a zip, embedded excels out of a word doc, images out of a powerpoint, etc)

2. convert all said documents to some kind of standard media format so that the native viewing applications are not needed (all said document types to png, or pdf, or tif)

3. allow full-text searching across all electronic files

With these kinds of tasks available as an automated feature, the real product would just allow a bunch of attorneys to review the documents and apply tags or labels to them. Once they've gone through all the documents, there is generally an output from the system that summarizes their work and provides the relevant documents, notes, etc.

Over the years of writing this kind of software, we've encountered a never-ending amount of complicates with file types, feature requests, etc. The real complexities with this kind of software is making your software work for a large number of customers. Every customer probably has a different idea about what they want this kind of tool to do for them.



The other issue is the absolutely bonkers number of attorney-hours racked up in reading all of those documents once they've been systematized.

That's where I see potential in this market. Ediscovery is a pain-point for law firm clients - especially large corporate clients who are constantly involved in complex litigation. Document review has to happen in order to effectively litigate (gotta find the smoking gun!) but when a bill comes through with hundreds or thousands of attorney-hours devoted to reading your opponent's old emails, ouch. The client hasn't even seen a work product yet.


The other thing is that you spend 10+X the initial effort of the feature for error handling. I sometimes envy all these "Internet" programmers who only have to deal with the web and don't have to worry about esoterica like, e.g., \x80 being the space character in old WordPerfect files.


Feature requests in e-discovery bloat your original software out of all proportion. Nevermind getting past the original part of effectively searching large troves of data in different formats.


Amen




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: