Hacker Newsnew | past | comments | ask | show | jobs | submit | metiscus's commentslogin

That is largely an artifact of the source data, sadly. I place the dots where they were found, if the dataset has that information, otherwise I georef them back to the "findsite" in the data, which aggregates many finds at one place. A lot of these were found in the 1800s and just dont have good find lla data.

Ah, I was referring to the zoom-level-dependent clustering, which I find makes it hard to see the distribution of points when zoomed out all the way. There's still quite a bit of detail even with many of the points sharing the same location.

But yes, the runbook in the project gives the llm instructions on how to use the scripts and what to modify. I let Claude code read that and tell it to work on a province. It runs a small segment and analyzes the results until it hits 1-2% error with no systemic errors. If it can't get all the errors out then I have it switch to using gemini-flash-lite-latest instead of 2.5, which costs slightly more but performs much better. Basically Claude code runs a self governing loop with my oversight mutating the prompts and data inputs to extract all the names.

EDIT My instructions to the supervising LLM are in here https://github.com/metiscus/roman-names/blob/feature/webapp-...


The short answer is yes. Successo-Terra has done a lot of work on the preroman history of Italy quite recently.

https://www.successoterra.net/en

The Vesuvius scrolls have been partially decoded with some interesting results. https://www.smithsonianmag.com/smart-news/three-students-dec...

The Vindolanda tablets are constantly being worked on as well https://www.heritagedaily.com/2017/07/roman-tablets-unearthe...


This was really insightful thanks!

I also used EDH as ground truth but not directly as a source and link out to it and other sources when I have those links available.

The actual code is here https://github.com/metiscus/roman-names/ and it is licensed under MIT, while the datasets are slightly more restrictive, and mentioned in the repo.

Yeah, I thought about that. The issue is that the date density is poor already and the ranges are pretty broad. Someday I'll give it a try, but the search interface can limit you to date ranges for now, so the infrastructure is all there.

So most inscriptions are somewhat formulary, and I provide examples to the llm to assist it to find the names. I also have a postprocess blacklist that removes some known cases where things slip through. It's never going to be 100% perfect but to my untrained eye, it seems to do okayish. Waiting on some professionals to cross check my data. If that is you, you can search and export the data in csv via the browse button.

so there should be some links when I have the information available. If you link me the entries I'll see what is going on.

Somehow despite writing an essay above, I forgot to mention that the whole codebase and web frontend is on GitHub.

For reasons the main dev right now is on a branch, also the browse feature is live allowing a better search ability.

https://github.com/metiscus/roman-names/


I've contacted a Professor in Europe who was doing research in this area and pointed him to the page. What I genuinely need is someone to spot check a few of the attributions. I can send you a list of the ones I think are the most likely to be good.

  1. Laepoca / Laepocus — Piquentum, Venetia et Histria (1–50 AD)
  Three family members: two women (Laepoca Regilia, Laepoca Tuia) and a man (Metellus Laepocus). The nomen appears in both feminine and masculine forms in the same inscription, pointing to a
  genuine local gentilicium, likely of Istrian or Liburnian origin.
  https://new.roman-names.com/#edcs_id=EDCS-04200530
It looks like my auto-translation and summarization layer is hallucinating on this entry, but the extraction appears correct. I'll flag it for the next run.

  2. Tocernius — Eraclea Veneta, Venetia et Histria (3rd c. AD)
  Father (C. Tocernius Hermeros) and son (C. Tocernius Maximianus), the latter a soldier of Legio II Italica. Probably a Venetic name surviving into the imperial period.
  https://new.roman-names.com/#edcs_id=EDCS-04200461
Here, the auto-translate and summary worked as intended. It does garble the dedication into the status.

  3. Laulenia — Thibilis, Numidia
  Two sisters, Laulenia Matrona and Laulenia Naxina, daughters of the same Marcus. The name looks Berber/Numidian in origin. (I should note that our pipeline transcribed the nomen as
  Lauzenia — the raw EDCS text reads Laulenia, which is probably the correct form.)
  https://new.roman-names.com/#edcs_id=EDCS-13500401
The auto-translate and summary layers do not make this error, only the name extraction layer does. I have flagged the entry and am diagnosing it.

  4. Kanulanius / Nansinia — Flavia Solva, Noricum
  Father (C. Kanulanius Eumitus) and son (C. Kanulanius Nepos, a soldier of Ala III Thracum). The K-spelling may reflect local Celtic orthographic convention. The wife's nomen, Nansinia,
  also appears unattested in standard sources and may be a second find in the same inscription.
  https://new.roman-names.com/#edcs_id=EDCS-14500644
Here there is an issue where I think in the processing for the web I am feeding interpreted text into the raw extraction field as my displayed raw text seems to be expanded from EDCS. Mine: Caius Kanulanius Eumitus vivus fecit sibi et Nansiniae Verecundae coniugi et Caio Kanulanio Nepoti filio militi alae III Thracum annorum XXV stipendiorum VI loco et impensa Anni Festi

EDCS:

C(aius) Kanulani/us Eumitus / v(ivus) f(ecit) sibi et / Nansiniae / Verecundae con(iugi) / et C(aio) Kanulanio / Nepoti f(ilio) mil(iti) alae III / Thrac(um) an(norum) XXV stip(endiorum) VI / loco et impensa / Anni Festi


What exactly are you trying to go for with status? It seems it mostly records filiation, but I don't think that's an intuitive use of the word. Knowing what you're actually going for would be helpful.

Also, you might want to include the source from EDCS. #3 above comes from ILAlg, and EDCS has a key for all the collections and their abbreviations. This will help someone be able to track down the original inscription more easily.

1. That first one is rough, and the translation is broken (it doesn't even translate Surus' name), but you got the people down. Regilia is just a guess, though.

3. Yep, Laulenia is the original name. Seems like AI is hallucinating here.

4. Have you thought about code that strips the parenthesis first, instead of letting AI do it? Also, loco et impensa is something like "grave site and expense", not "expense and initiation." Locus means "place", and in epitaphs often just refer to the burial place.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: