HOWTO Add New Words to Rhyme

The cmudict data

The Rhyming Dictionary takes a collection of word pronunciations and parses them to find rhymes and syllable counts (as explained elsewhere). Thus, if you want to add additional words to the dictionary, you only have to define its pronuncation and add it to the cmudict.patch file - in the proper order, of course. A typical pronuncation looks like:
KELP  K EH1 L P
In this case, the word "kelp" is composed of the phonemes "K", "EH1", "L" and "P". The possible phonemes are listed in the following tables:
SyllableExample
AAresolve
AErat
AHforgettable
AOtalk
AWabound
AYrise
EHhair
ERunder
EYpray
IHfin
IYknees
OWcow
OYfoil
UHwood
UWfoo
Non-SyllableExample
Bball
CHachieve
Ddog
DHworthy
Ffrog
Gfrog
HHhat
JHage
Kbreak
Llamp
Mlamp
Nnot
NGthing
Ppray
Rray
Ssaid
SHshed
Ttread
THthread
Varchive
Wway
Yacute
Zrains
ZHillusion

However, you'll notice KELP's EH syllable is followed by a 1. The 1 signifies EH is the primary stressed syllable of KELP. And, as expected, there can be only one primary stress in a particular word. Most syllables are followed by a 0, which signifies no stress. This is most common, and a word can have any number of un-stressed syllables. Occasionally words will also have a secondary stress, indicated by a 2. As with primary stress, words can only have a single secondary stress.

New words are not added to the original cmudict file. Instead, words are added to the cmudict.patch file and that file is merged with the original to generate the full list of words. And, just as importantly, words must be added in alphabetical order. You see, by having both the cmudict and cmudict.patch files be ordered alphabetically, they can be combined into a single, larger file (also sorted alphabetically) in a very short amount of time. And, by knowing the individual words are pre-sorted during the install process, the Rhyming Dictionary can take a lot less time getting its database files compiled.


Building New Pronunciations

Few english words have entirely unique pronunciations. Thus, adding them simply requires the alteration of a phoneme or two. For example, the word WHELP (meaning a dog or wolf pup) is pronounced almost identically to the word HELP. Since, in the cmudict file, HELP is pronounced HH EH1 L P we simply replace the HH phoneme with W, yielding the entry:
WHELP  W EH1 L P
to be added to the cmudict.patch file. Similarly, many plural forms of words already present merely have an "S" on the end - requiring only the addition of the Z phoneme at the end of the pronunciation. Also quite common in english is the use of compound words, such as OCEANFRONT. For these, combining the pronunciations of both words is typically all that is required, except that one of the individual words must have a primary stress and the other will have either a secondary stress or no stress.

Ultimately, the process of building pronuncations is tedious. But, each added word will automatically rhyme correctly and have the correct syllable count.

Automating the Pronunciation Process

What you'll quickly discover is that scrolling through the entire cmudict and cmudict.patch files, even with a good text editor, takes a lot of time. To make this easier, the Rhyming Dictionary source code includes a tool that automatically looks up the requested words and returns their pronunciations. This tool, not surprisingly, is called "pronounce". To build it, type:
make pronounce
in the same directory with the Rhyming Dictionary source code. This yields the pronounce executable binary and the pronounce.db and fullmultiple.db database files. These database files are necessary because none of the typical Rhyming Dictionary source files contain the full pronunciation of a word (thus requiring the pronounce.db file) and the typical multiple.db file is stripped of multiple pronunciations yielding identical rhymes (hence the fullmultiple.db file).

To use pronounce, type:

pronounce pronounce.db fullmultiple.db
This should result in a command prompt similar to the Rhyming Dictionary's interactive mode. From there, typing in a word will display all of its pronunciations. For example:
PRONOUNCE> dictionary
D IH1 K SH AH0 N EH2 R IY0
PRONOUNCE> 
This data can then, hopefully, be used to generate pronuncations for new words to be added to the cmudict.patch file.

Then, before a cmudict.patch file is used, you should use:

make verify
This will check the dictionary files for invalid phonemes, invalid ordering or any other error that might cause the Rhyming Dictionary to choke.

Finally...

Once you've built some pronunciations to new words, you'll probably want to get them added to the full Rhyming Dictionary source. That way you won't have to maintain your own individual cmudict.patch file and everyone can benefit from your hard work. To do this, simply email your pronunciations to me either in the body of your message or as an attached text file. The use of any pronunciation submission for research or commercial purposes must be completely unrestricted. The original cmudict file is distributed under those terms, as is the cmudict.patch file I have created. So, by having all submissions match those terms, they can all be safely combined into a larger file that is also free to use for research or commercial purposes. If you don't like those submission terms, don't submit anything - but I would strongly encourage it.