Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

AI had been a super useful for processing historical data. Interviewed a volunteer last month from the diary archive in Germany, and they're using supervised AI for diary transcription. Going from (old) personalized hand script to text is a lot of work, even for experienced transcribers. Being able to automate the first pass of that has been a huge boon to their processing pipeline.


Can you go a bit deeper on this?

If the risk of mistranslation is high, I fail to comprehend how letting AI "take a swing at it" does not reduce the translation quality?

How are they ensure no drop in translation quality?


They're doing transcription, not translation - so, turning someones pages of scrawled script into typewritten text. They have around 20 people nationwide that are able to do this. Most of them are older volunteers who aren't all that interested in computer assistance, but about a third of them have started leveraging the newer AI tools and it has accelerated their throughput significantly.

Having a 'best guess' at the lettering is really handy - in some cases the writing is really rather difficult to make out at all. Even being able to run something as simple as frequency analysis on stroke patterns would be a massive benefit.

At this point they're becoming throughput bound on the scanning process. Diaries are digitized since the archive is in one place and their transcription experts are spread out over the country.


As a profession (and under time constraints) ... Tom Scott : How the US Postal Service reads terrible handwriting - https://youtu.be/XxCha4Kez9c

Part of the story is that the OCR that is handling hand lettered addresses.

I also chuckled at the cursive letter recognition sheet on the side of the cube.


Can you please explain to me how using AI as a "first pass" (in any context) doesn't simply make the second pass more lazy?

If my name is associated with the first pass, and I get it wrong, there's a gravity to that since my name's attached. If I use an AI for the first pass, get it wrong, and my names still attached... my name takes a hit, BUT, my guilt and desire to improve is absolved a little bit by the existence of the AI tool taking on the first pass. After all, it wasn't me who got it wholly and completely wrong, it was the AI. Next time I'll be more careful, right? Rinse and repeat.


I hadn't considered or read about this problem before but it makes sense.

It reminds me of the cuneiform problem. Between 500,000 and 1 million tablets have been collected. This is one of the earliest preserved writing systems. Even so, fewer than 10% of these tablets have been translated. I was surprised to learn this but it makes sense. There are several problems:

1. Scribes used a lot of shorthand;

2. Cuneiform itself changed over time;

3. Writers would use multiple languages (eg Sumerian, Akkadian), even on the same tablet. There are relatively few people fluent in these languages, particularly in multiple of them at once;

4. To some extent the tablets are 3D such that a 2D photo might not be sufficient to translate because you might need to physically turn the tablet to accurately see the marks; and

5. In some cases the tablets are incomplete or broken so you may not to figure out how things fit together.

I wonder if AI can help make inroads into this 90%. I really wonder what is waiting to be unearthed.


Lots of 4,000 year old complaints about copper, I would imagine.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: