I've been wanting to do some NLP on rap genius's corpus for ages. This is a great analysis. What I had thought of is write a program to detect ghostwriting. Rappers probably have some sort of lyrical 'DNA' in the construction of their verses. How often they use certain words, number of words per line, number of unique words per song, ratio of adjectives to nouns, that kind of thing. You could probably unmask some ghost-writing secrets.
Looking at the analysis here, it's interesting to see some clustering in the results. IMO the second cluster is the sweet spot: Wu Tang's excessive invention of vocabulary is cool but probably detracts from the poetic effect. Meanwhile rappers like 2Pac are just kind of boring IMO, at least going by their lyrics alone.
Looking at the analysis here, it's interesting to see some clustering in the results. IMO the second cluster is the sweet spot: Wu Tang's excessive invention of vocabulary is cool but probably detracts from the poetic effect. Meanwhile rappers like 2Pac are just kind of boring IMO, at least going by their lyrics alone.