Using n-gram on kickback, graft, bribe and corruption – Their historical occurrences compared

Raymond Cheng

First of all, let's check our dictionaries. According to the Merriam-Webster dictionary: A kickback is a "payment made to someone who has (illicitly) facilitated a transaction or appointment." To graft means to "(actively) make money by shady or dishonest means." To bribe means to persuade (someone; the bribe-taker) to act (passively) in one's favor, typically illegally or dishonestly, by a gift of money or other inducement." And, corruption refers to "dishonest or fraudulent conduct by those in power, typically involving bribery. Or, the action of making someone or something morally depraved or the state of being so."

OK, understood, pretty straight-forward.

But when did all these start getting associated with our lives? While it may take one quite a bit of time to look all over historical references for evidence, we may try, with the help of the latest search technology, to get a rough idea of how frequent these words have appeared in books, magazines and journals (certainly the ones published in English, I mean) over the last 200 years.

Old Books and Pen (c) Lorenzo Colloreta
Photo © Lorenzo Colloreta

So let's compared the historical trends of the occurences of the words "kickback", "graft" , "bribe" and "corruption". This time we will try our luck with Google Lab's N-gram Database, available at http://ngrams.googlelabs.com/datasets. Using the n-gram database [1], we are able to compare, in a very convenient way, the occurrences of individual words, or more technical called unigrams or 1-gram, as they appeared in books, magazines and journals since the early 1800s. Using a smoothing value of 3.0 [2], the relative historical occurrences of the four words are plotted against time (in terms of years) in the following graphs.

Historical occurence of kickback
Figure 1. Historical occurence of "kickback", from 1800 to 2009

In figure 1, occurence of the word "kickback" is shown. It was clearly not quite a common word before the 1900. But before I move further on, I have to stress that a 0% occurrence in the n-gram database does not necessarily indicate that a particular word cannot be found. It simply indicates, according to Google Labs, that the word have not appeared in at least forty books, magazines or journals [3] or, in other words, the word did not appear to have been used substantially during that period of time. And, obviously enough, it also does not mean that there was no such thing as kickback before 1900. There might just be another (possibly a more common) word or phrase for it before then and we simply have not looked into that (or overlooked).

Comparing usage of kickback and graft
Figure 2. Comparing historical occurences of "kickback" (blue) and "graft" (red)

So, the occurence of the word "kickback" (the blue line) did not appear to be substantial, at least not until after 1900. In fact, according to Britannica, the word did not even get officially listed in dictionaries until the period 1930-1935. This trend is somewhat similar to the occurrence of the word "graft", which saw significant increase of use after the 1900s yet this one is in fact a very old word appeared since 1350-1400 [4]. It should be emphasized that the word "graft" (the red line), according to the n-gram statistics, is a much more popular word when compared to "kickback" – you can see that the blue line lies almost flat throughout in figure 2 when the red line for "graft" is drawn with it.

Comparing usage of kickback, graft and bribe
Figure 3. "kickback" (blue), "graft" (red) and "bribe" (green)

What about "bribe" then? The word "bribe" (the green line), as shown in Figure 3 above, appeared to have been used rather constantly with only mild decrease over the last 200 years. It is definitely used much more than the word "kickback" (the blue line) and shows much less fluctuations than that of "graft" (the red line).

Comparing usage of kickback, graft, bribe and corruption
Figure 4. "kickback" (blue), "graft" (red), "bribe" (green) and "corruption" (yellow)

Now, let's add "corruption" to the picture. Among the four words, the word "corruption" (the yellow line) is the one that has been used most extensively and seemed to have already reached its maximum since early 1830s with a local minimum around the 1930s (that's 100 years apart). Looking at the lines for all four words, one might get the impression that "kickback" is the least commonly used word. The words "graft" and "corruption" went up together from around 1910 until they parted in the 1990s, with the word "graft" going down the trend and "corruption" going up all alone by itself. Such steady increase of the use of word "corruption" may be partially due to fact that the term "anti-corruption" has both been adopted by mainstream journalists over the last 50 years and that many anti-graft agencies around the world (especially those in Asia) are using the word in their names as well [5]. The word "bribe", interestingly, seemed to have maintained its share quite steadily all along for the whole period in this study, i.e. from 1810 to 2009.

So, if we are to pick one for further analysis, which one would you pick? The word "corruption" appears most. But if I am to pick one that gives the most stable occurence over a 200-year period, I would say "bribe" should be the candidate to go for. The words "graft" and "kickback", for obvious reasons, failed to make their ways into our semi-final.

Note 1: Jean-Baptiste Michel*, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, William Brockman, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden*. "Quantitative analysis of culture using millions of digitized books." Science, published online ahead of print: 2010/12/16, see http://ngrams.googlelabs.com/

Note 2: When data is viewed as a moving average, trends often become more apparent. A smoothing value of 1 means that the data shown for 1950 will be an average of the raw count for 1950 plus 1 value on either wide: (count for 1949 + count for 1950 + count for 1951) and then divided by 3. So a smoothing of 10 means 21 values will be averaged: 10 on either side, plus the target value in the center of them. At the left and right edges of the graph, fewer values are averaged. With a smoothing of 3, the leftmost value (pretend it's the year 1950) will be calculated as (count for 1950 + count for 1951 + count for 1952 + count for 1953), divided by 4. A smoothing of 0 means no smoothing at all: and simply the raw data is presented.

Note 3: According to Google Labs, the n-gram viewer only considers n-grams that occured in at least 40 books, magazines or journals.

Note 4: The word "graft" first originated in 1350-1400 as the earlier word "graff". The Middle English word "graffe" and "craffe" were said to be borrowed from the Old French words "graife", "greffe", and "graffe".

Note 5: Anti-corruption agencies established in Asia in or before 1999 having the word "corruption" in their names include: Corruption Practice Investigation Bureau, Singapore (1952), Independent Commission Against Corruption, Hong Kong (1974), Agency Against Corruption, Taiwan (1989), Commission Against Corruption, Macao (1999), Anti Corruption Bureau, India (1988), Commission to Investigate Allegations of Bribery or Corruption, Ski Lanka (1994), Anti Corruption Bureau, Brunei (1982) Anti-Corruption Unit, Cambodia (1999), and National Anti-Corruption Commission of Thailand (1997). It is interesting to note that there are (as far as I can tell) only four anti-corruption agencies in Asia using names not including any variant of either the word bribe or corruption. They are (and none of them was established before the year 2000): Japan Financial Intelligence Center, Japan (2007), Commission for the Investigation of Abuse of Authority, Nepal (2007), National Accountability Bureau, Pakistan (1999) and the Presidential Anti-Graft Commission of the Philippines, established 2001, aboilished in 2010.