Every American elementary schooler is taught the mnemonic device, “I before E, except after C.” I am dumbfounded by how bad this rule is. I misspell words all the time. Words like policies, species, and caffeine I can never spell correctly because they violate the rule. The rule is littered with exceptions. Recently, I wondered how bad the rule actually is, how many words in the English language that contain either (either is an exception to the rule) combination of ie or ei represent exceptions to the rule? Well I set out solve this problem. The data proves that the I before E rule is broken.
So what’s the point? Why have I spent so much time worrying about such a pointless rule? Well, I became interested in the data when I kept misspelling words based on my understanding of the rule. I then googled for the data, figuring someone must have analyzed the data. Much to my dismay, this information is not available. As far as I can tell, this study is the first of its kind.
This is what I wanted to find out:
- How many words contain the combination ie or ei?
- How many exceptions are there to the rule?
- What is the percentage of words that are exceptions to the rule?
How I gathered the data
I used a dictionary webservice to gather the data. Obviously thumbing through a Merriam-Webster dictionary to gather the data would be a monumental waste of time. Using a webservice will allow me to do this research programatically, and allow me to expose the data to anyone who wants the data.
The webservice I used is DictService, which allows you to use several different English dictionaries. For this article I used the The Collaborative International Dictionary of English.
History and facts about the I before E rule
Nobody knows who created the rule. The earliest written use of it was found in a footnote in James Stuart Laurie’s Manual of English Spelling which was written in 1866. It is safe to say that the rule has been used in English classrooms across the world for at least the past 140 years. There are a couple different variations to the rule, such as:
i before e except after c or when sounded like a as in neighbor and weigh
This addition makes the rule a bit better, but I am not going to focus on that variation, as there are still plenty of exceptions to that rule as well (species, either, etc).
The rule is so bad, and has so many exceptions that in June of 2009 the British government advised against teaching the rule. Some funny info about the debates that this caused in England can be found here. The best quote came from Michael Grove in arguing against abandoning the rule:
Having systematically lowered school standards for a decade, it is sadly no surprise that the Government is now actively telling teachers not to bother trying to teach children how to spell properly.
The word that violates the rule the worst is the word “oneiromancies,” which is the plural form of oneiromancy (meaning divination by means of dreams). The word breaks the rule twice, in both ways.
Of the 100 most commonly used words in the English language, only one word contains a combination of either ei or ie, that word is “their,” which violates the rule.
Much of this history comes from the Wikipedia topic on the subject.
Analyzing the data
Now let’s take a look at the data. There are a total of 5223 words in the English language that contain either the letters ei or ie. Let’s first examine the “except after C” part of the rule.
Except after C
There are 107 words in the English language that contain the letters c and are followed directly by the letters ei. Full list of the words here.
However, there are 212 words in the English language that contain the letters c and are followed directly by the letters ie, which violates the rule. Full list of the words here.
So in practice, if you did not know how to spell a word, you would be correct a higher percentage of the time (66%) by spelling the word cie instead of cei as the rule suggests you should.
One could argue that a better rule would be “I before E, especially after C.”
I before E
Ok, now let’s look at the first part of the rule. I should come before E, except after C.
There are 3885 words in the English language that contain the letters ie. Full list of the words here.
There are 1338 words in the English language that contain the letters ei. Full list of the words here.
So that gives us a total of 5223 words in the English language that contain some combination of ei or ie. From that list of 5223 words, 1338 of them contain the letters ei, but as we know, if the letters ei are preceded by the letter c, it doesn’t violate the rule, and as I mentioned earlier, 107 words contain the letters cei. So if we take our list of 1338 words, where the letter e comes before the letter i, but is NOT preceded by c, then we have a total of 1231 words that violate the rule. This equates to over %25 of all words that contain ei or ie are in violation of the rule.
Out of all the words in the English language, 25% of the words that contain an ei or ie combination violate the rule.
Words the contain the letter c and are immediately followed by a combination of the letters ei or ie are especially bad for violating the rule, as %66 percent of the words violate the rule.
So with numbers so strongly against the rule, why do we still continue to teach this? If the rule “look both ways before you cross the street” was wrong %25 of the time, would we teach it?