Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ummm, the author used awk for the first version in the 1990s.

> This was easy to do, even at the time, when the word list itself, at 2.5 megabytes, was a file of significant size. Perl and its cousins were not yet common; in those days I used Awk. But the task is not very different in any reasonable language:



I know. I’m talking about Perl. Try to keep up.


Do pay attention.

You wrote "Prior to Perl doing this was painful.".

I highlighted how you contradicted the author's claim that it was easy to do using awk. Awk existed long before Perl.


As a concrete example, while my awk skills are extremely rusty, here's a program which will normalize the input line, create a table mapping the normalized name to the matching original lines, then at the end it only displays the ones with at least 5 anagrams.

I tried to avoid modern awk features, like asort, to be something that would have worked in the 1980s:

  {
      # Convert to normal form:
      #  1. Fold to lower case
      #  2. Bin the letters to get frequency counts
      #  3. Only consider lower case ASCII letters
      split(tolower($0), letters, "");
  
      # Ignore asort() in modern awks and do a bin sort instead.
      for (i in letters) {
          c = letters[i];
          repeats[c] = repeats[c] c;
      }
  
      normal_form = "";
      for (i=97; i<=122; i++) {
          c = sprintf("%c", i); # no chr() in a 1980s awk
          normal_form = normal_form repeats[c];
      }
  
      table[normal_form] = table[normal_form] "," $0
      delete repeats;
  }
  END {
      # Only show the ones with at least 5 matches
      for (i in table) {
          match_str = table[i];
          split(match_str, matches, ",");
          num_matches = length(matches)-1
          if (num_matches >= 5) {
              # print the number of matches, then the match string
              printf("%d%s\n", num_matches, match_str);
          }
      }
  }
When I try it on a word list I have handy, here are the most common words:

  % awk -f anagram.awk < words_alpha.txt | sort -n -t, | tail -5
  13,elaps,lapse,leaps,lepas,pales,peals,pleas,salep,saple,sepal,slape,spale,speal
  14,anestri,antsier,asterin,eranist,nastier,ratines,resiant,restain,retains,retinas,retsina,stainer,starnie,stearin
  14,apers,apres,asper,pares,parse,pears,prase,presa,rapes,reaps,repas,spaer,spare,spear
  14,arest,aster,astre,rates,reast,resat,serta,stare,strae,tares,tarse,tears,teras,treas
  15,alerts,alters,artels,estral,laster,lastre,rastle,ratels,relast,resalt,salter,slater,staler,stelar,talers
Certainly Perl is more succinct, though note that even up to Perl 4 in the early 1990s you would need to use the string concatenation method to store the list of matches in the table.

But, "painful"? No. Not to someone who knew how to use awk.


What’s the word for when snark backfires?


snark backfires = barf earns kicks


“Sorry”?


krans




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: