Rhyming Dictionary Mechanics

. . . or how I got this thing to work

If you want to implement your own version of a rhyming dictionary, or you're just curious about how it works, then you've come to the right place.

In order to get it working on as many systems as possible, I used GDBM, the GNU Database Manager, as a place to put all the dictionary data. While the name might sound fancy, GDBM is little more than a giant file-based hashtable. In other words, you give it a string (i.e. "foo") and GDBM will return the value associated with that string (i.e. "bar"). Since the rhyming dictionary takes a word and returns all the rhymes that go with it, this sort of handler seems like a natural thing to use.

Having the user-given word be the key and all the rhymes that go with it be the values seems like a real trivial way to solve this problem. So what's the big deal? Well, the problem with that approach is that it wastes a lot of space - and I do mean a lot! Why? Try getting the rhyme of "fixation". The result is a truly enormous list of rhymes and each of those words must also contain the whole list of rhymes since each will also rhyme with "fixation". Ouch!

The solution, obviously, is to store only one copy of the list for all those words. So, instead of one GDBM database with a straight word->rhymes mapping, we'll use two. The first database is a word->key map and the second is a key->rhymes map. This way, "fixation" and "station" both map to the same key and that key contains all the other words. Perfect!

Oh, but we also want the words sorted by syllable count and then alphabetically. One solution is to add a third GDBM database that maps word->syllables. This is do-able, but wasteful since the user will have two copies of all those words. A better solution is to append the syllable count to that first word->key database so it becomes a word->(key, syllables) map instead. The cost is minimal, and all is well.

But I have one final horror to unleash upon you. Some words have multiple pronunciations! (Think "tomato") So we have two options: either combine the different pronunciations into one, assume all words can have a range of syllable counts and generate new keys in the database - or store the entire list of multiple pronunciations in a third database. I've chosen the latter, since I may wish to have the word seperable so that all pronunciations get their own list. Combining them makes handling the data easier, but seperating them gives me more control. I choose the control part.

So, to handle multiple pronunciations, I just check the third database for a list and, if present, grab the rhymes from each of those words and merge the list into one for display.


This, in general, is how the rhyming dictionary works. Perl and Python both have GDBM connectivity, so feel free to point them to the rhyming dictionary data and try it yourself. Here's a neat example using Python code that takes the word "rhyme" and ultimately returns a list of the syllable counts of all the words that rhyme with it:

Python 1.5.2 (#1, Feb  1 2000, 16:32:16)  [GCC egcs-2.91.66
19990314/Linux (egcs
- on linux-i386
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import gdbm
>>> import string
>>> words = gdbm.open("/usr/share/rhyme/words.db")
>>> rhymes = gdbm.open("/usr/share/rhyme/rhymes.db")
>>> key = words["RHYME"]
>>> key
'+I 1'
>>> key = string.split(key)[0]
>>> key
'+I'
>>> rhymes[key]
"ANTICRIME ANTICRIME(2) BEIM CHIME CLIMB CRIME DIME GRIME HAIM HEIM
HIME I'M KIME LIME LYME MIME ONETIME PART-TIME PRIME RHYME SEIM SIME
SLIME SUBLIME SYME THYME TIME"
>>> for rhyme in string.split(rhymes[key]):
...     print(string.split(words[rhyme])[1])
... 
3
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
1
1
1
1
1
2
1
1
1
>>> 

So really this isn't all that complex. Fire up your favorite scripting language and enjoy! Othewise, you can return to the main homepage.