(less than 15 in the 1000 line pairs) scored highly without
being perceivably rhyming, and most low scoring “true”
rhyme pairs take part in complex mosaic and polysyllabic
rhymes.
Finally, we used our model on a set of manually an-
notated rap lyrics, to measure the ability of the program
to find both internal and line-final rhymes. We used five
songs of varying style: the Beastie Boys’ “Intergalac-
tic”, a Grammy-winning song in the old-school style;
Pharoahe Monch’s “The Truth” (featuring Common and
Talib Kweli) and “Right Here”, which were annotated by
Alim [9] and feature high rhyme density and a compli-
cated scheme; Jay-Z and Eminem’s “Renegade”, which
features very high rhyme density; and Fabolous’ “Trade
It All (Part 2)”, a song specifically mentioned by Alim
for its prevalence of long (five or six syllable) rhymes.
We show the ROC curves for this test set in Figure 2; the
best overall performance is for specificity and sensitivity
just above 60%. Most “false positive” are rhymes that
were not annotated due to lack of rhythmic importance
or accidental omission. False negatives included several
where the performer created a rhyme from words that do
not appear to rhyme as text, and some longer rhymes that
were cut off prematurely due to too many non-rhyming
syllables within them and lower scoring syllable pairs
surrounding them. Finally, some rhymes were missed due
to intervening rhymes being found between the rhyming
parts, particularly when the threshold for rhymes is set
low. This is especially evident in the ROC curves at lower
cut-off thresholds, where true positive rates peak around
80% and begin to decline as the threshold is lowered.
8. EXPERIMENTS
We used our procedure to examine a variety of features
about the rhymes in several sets of lyrics. We computed
the number of syllables per line, the number of rhymes
per line, the number of rhymes per syllable, average end
rhyme scores, and proportion of rhymes having two, three,
four, or more syllables. We also counted all of the complex
rhyming features (bridge, link, chain and internal rhymes)
per line.
We hypothesized that these features would show dif-
ferences between genres of popular music, and calculated
them for four sets of data: the top 10 songs from Bill-
board Magazine’s 2008 year-end Hot Rap Singles chart;
the top 20 songs from the 2008 year-end Hot Modern Rock
Songs chart; the first 400 lines of Milton’s “Paradise Lost”
[18], as a similar-sized sample of non-rhyming verse; and
the top 10 songs from the 1998 year-end Hot Rap Singles
chart. To compare the verses most of all, the song lyrics
were modified to remove intro/outro text, repeated lines,
and additional choruses. Our results are in Table 3. High
end rhyme scores are indicative of song lyrics in general
(relative to unrhymed verse); rap has higher rhyme density,
internal rhyme, link rhymes, and bridge rhymes. Interest-
ingly, blank verse and rock lyrics have similar amounts of
rhyming per line, but rock lyrics have more rhymes per syl-
lable. The data from 1998 and 2008 rap songs suggest that
in their rhyming pattern, there has not been much shift in
style.
Rap ’08 Rap ’98 Rock Blank
Number of Lines 476 613 502 400
Number of Syllables 4646 6492 4053 4146
Syllables per Line 9.76 10.59 8.07 10.37
Number of Rhymes 794 1118 476 393
Rhymes per Line 1.67 1.82 0.95 0.98
Rhymes per Syllable 0.17 0.17 0.12 0.09
Rhyme Density 0.28 0.27 0.19 0.12
Average End Score 5.28 5.21 4.36 2.49
per Syllable 3.75 3.67 4.01 2.28
Doubles per Rhyme 0.23 0.29 0.15 0.18
Triples per Rhyme 0.08 0.06 0.04 0.03
Quads per Rhyme 0.02 0.03 0.05 0.00
Longs per Rhyme 0.03 0.02 0.04 0.01
Internals per Line 0.62 0.60 0.27 0.28
Links per Line 0.20 0.28 0.13 0.16
Bridges per Line 0.43 0.48 0.28 0.40
Chaining per Line 0.32 0.18 0.15 0.07
Table 3. Rhyme Features for Different Genres
We also hypothesized that features of individual rap-
pers might also be informative, so we produced these
statistics for albums by nine famous MCs from a diverse
range of styles and eras: Run-DMC, Rakim, Notorious
B.I.G., 2Pac, Jay-Z, Fabolous, Eminem, 50 Cent, and Lil’
Wayne. Features were calculated for segments of at least
40 lines to produce means and standard deviations of the
statistics for each album. The results indicate that many
of these features can be characteristic of different artists’
styles. For example, Run-DMC’s (1984) old-school style
has lower rhyme density and less internal rhyme with an
average of 1.5 rhymes per line and only 6% of rhymes
being longer than 2 syllables; while Rakim (1987), known
for his more complex style, is detected as using more
internal rhymes (0.63 per line to Run-DMC’s 0.48) and
more rhymes longer than 2 syllables (12%). Rival rappers
Notorious B.I.G. (1994) and Tupac Shakur (1995) display
fairly similar style characteristics: 28% of their rhymes
are 2 syllables long, 6% are three syllables, and 3% are
longer. However, Biggie’s lines are significantly shorter in
length, with, on average, 10.8 syllables to 2Pac’s 11.6.
Artists from the early 2000s like Jay-Z (2001), Eminem
(2000), and especially Fabolous (2001) favour longer
rhymes, with 15%, 17%, and 30% respectively of their
rhymes being longer than 2 syllables. They also have the
most rhyme density overall, with 2.2, 2.3, and 1.9 rhymes
per line respectively. Jay-Z and Eminem tend to use
more internal rhyme as well, having 0.8 internal rhymes
per line–about 25% higher than the average among other
MCs. Although he portrays a “thug” persona, 50 Cent
(2003) uses the most syllables per line (12.1), while Lil’
Wayne (2008) has the fewest (10.2). However, he manages
high rhyme density (0.3 rhymed syllables for each syllable
used) with relatively few (only 1.8) rhymes per line. In
general, we find that automatic rhyme detection can yield
characteristic data about performers and genres.