Hi. I have a spreadsheet with semi-duplicate entries that I want to merge (like merging Google Contacts, for example).
Input:
-- Robert Smith, 123 Main St., New York, NY, $foo
-- Bob Smith, 123 Main Street, NY, NY, $bar
Output:
-- Robert Smith, 123 Main Street, New York, NY, $foo, $bar
Googling, I see all sorts of various Windows-only list-management software that you can buy, or companies that will become my list cleanup provider for a couple thousand dollars, etc. This is for a one-shot merging. Is there any free/open-source software I can use? Or a web service that I can pay $9.95 to and upload my list, download a cleaned/merged version, something like that? I don't even mind making the final decisions about what is and isn't a duplicate entry - software doesn't have to be brilliant, just vaguely smart.
http://code.google.com/p/google-refine/
Another option is to download a copy of sql server development edition and use the fuzzy matching SSIS utilities. It is pretty easy to use.