Splink - a FOSS python library for probabilistic record linkage (fuzzy matching/entity resolution).
Splink is dramatically faster and works on much larger datasets than other open source libraries. I'm particularly proud of the fact we support multiple execution backends (at the moment, DuckDb Spark Athena and Sqlite, but additional adaptors are relatively straightforward to write).
We've had >4 million pypi downloads and it's used in government, academia and the private sector, often replacing extremely expensive proprietary solutions.
Splink is dramatically faster and works on much larger datasets than other open source libraries. I'm particularly proud of the fact we support multiple execution backends (at the moment, DuckDb Spark Athena and Sqlite, but additional adaptors are relatively straightforward to write).
We've had >4 million pypi downloads and it's used in government, academia and the private sector, often replacing extremely expensive proprietary solutions.
https://github.com/moj-analytical-services/splink
More info in blog posts here: https://www.robinlinacre.com/introducing_splink/ https://www.robinlinacre.com/splink_3/