Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem with average per song is that you "use up" words in every new song, so all things being equal each marginal song has progressively fewer new words.


I bet you could get something insightful from plotting "unique words" versus "total words" - That might give a good idea of the amount of repetition over time, the length or quantity of output, and the total vocabulary.


here's what this looks like. ugly as sin as useless for comparing rappers.

http://www.mdaniels.com/vocab/scatter.png

love your other ideas – hopefully can do them later.


Strange comment, you realise that's not an inherent truth of language? Unique words per song is trivial to calculate


If a rapper released one song using n distinct words their score would be n/1, and if they released a second song using the same set of words their score would halve, to n/2, despite the fact their demonstrated vocabulary is still n words.

In fact, if their first song used n distinct words and their second used a completely distinct set of words, but the second song was shorter than the first, their score would drop.

That would be unusual behaviour for a measure of vocabulary.


I don't think that's what the poster meant. By "average unique words per song" I take it to mean, within each song words are only counted once, but across songs, words can be counted multiple times. So if song A had the words "I like cats" and song B had the words "I like dogs", then the average unique word count would be ((3 + 3) / 2) = 3, not ((3 + 1)/2) = 2.


That's definitely one solution, but it still wouldn't quite capture it. As an extreme example, if rapper A produced 100 songs, each with exactly the same lyrics, they should surely be penalized compared with rapper B producing 100 songs with no shared words— even if rapper A's average unique-words-per-song is higher than rapper B's.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: