Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Splink - a FOSS python library for probabilistic record linkage (fuzzy matching/entity resolution).

Splink is dramatically faster and works on much larger datasets than other open source libraries. I'm particularly proud of the fact we support multiple execution backends (at the moment, DuckDb Spark Athena and Sqlite, but additional adaptors are relatively straightforward to write).

We've had >4 million pypi downloads and it's used in government, academia and the private sector, often replacing extremely expensive proprietary solutions.

https://github.com/moj-analytical-services/splink

More info in blog posts here: https://www.robinlinacre.com/introducing_splink/ https://www.robinlinacre.com/splink_3/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: