Ummm, the author used awk for the first version in the 1990s.
> This was easy to do, even at the time, when the word list itself, at 2.5 megabytes, was a file of significant size. Perl and its cousins were not yet common; in those days I used Awk. But the task is not very different in any reasonable language:
As a concrete example, while my awk skills are extremely rusty, here's a program which will normalize the input line, create a table mapping the normalized name to the matching original lines, then at the end it only displays the ones with at least 5 anagrams.
I tried to avoid modern awk features, like asort, to be something that would have worked in the 1980s:
{
# Convert to normal form:
# 1. Fold to lower case
# 2. Bin the letters to get frequency counts
# 3. Only consider lower case ASCII letters
split(tolower($0), letters, "");
# Ignore asort() in modern awks and do a bin sort instead.
for (i in letters) {
c = letters[i];
repeats[c] = repeats[c] c;
}
normal_form = "";
for (i=97; i<=122; i++) {
c = sprintf("%c", i); # no chr() in a 1980s awk
normal_form = normal_form repeats[c];
}
table[normal_form] = table[normal_form] "," $0
delete repeats;
}
END {
# Only show the ones with at least 5 matches
for (i in table) {
match_str = table[i];
split(match_str, matches, ",");
num_matches = length(matches)-1
if (num_matches >= 5) {
# print the number of matches, then the match string
printf("%d%s\n", num_matches, match_str);
}
}
}
When I try it on a word list I have handy, here are the most common words:
Certainly Perl is more succinct, though note that even up to Perl 4 in the early 1990s you would need to use the string concatenation method to store the list of matches in the table.
But, "painful"? No. Not to someone who knew how to use awk.
> This was easy to do, even at the time, when the word list itself, at 2.5 megabytes, was a file of significant size. Perl and its cousins were not yet common; in those days I used Awk. But the task is not very different in any reasonable language: