Text Analysis Tools - Cipher Toolboxes

INDEX OF COINCIDENCE CALCULATOR

What is the Index of Coincidence?

The index of coincidence of a text is the probability of two randomly drawn letters from that text being the same. A higher IOC means that this probability is higher, which occurs when some letters appear much more often than others. On the other hand, a 'flatter' set of letter frequencies would lead to a lower IOC.

It can be used as a simple way of comparing two texts. For cryptography it can tell you how likely a given text is to be in a certain language, by comparing the text's IOC to the language's average IOC (see below). It can also be used to predict what type of cipher a text is encrypted with, since some ciphers, such as simple substitution ciphers, do not change a text's IOC at all, while more complex ciphers such as the vignere will reduce the IOC significantly, since they 'flatten' the letter frequencies.

IOC for different languages

Different languages, of course, have different expected/average IOCs - here are a few:

English: 1.73
French: 2.02
German: 2.05
Italian: 1.94
Portugese: 1.94
Russian: 1.76
Spanish: 1.94

Source: Wikipedia

Generally, if the IOC of a ciphertext is within 0.1 of the expected IOC of the plaintext language, the encryption method is likely to be monoalphabetic.

Example use of the Index of Coincidence

For this example, we will use a part of Martin Luther King's 'I have a dream' speech:

I have a dream that one day this nation will rise up and live out the true meaning of its creed: "We hold these truths to be self-evident, that all men are created equal."

I have a dream that one day on the red hills of Georgia, the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood.

I have a dream that one day even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice.

I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character.

If you copy that text into the index of coincidence calculator above, you will find that it has an index of coincidence of 1.8442, which is noticably higher than expected for English plaintext. Why? Presumably because of all of the repetitions of 'I have a dream'! Now let's compare that to when it is encrypted with a substitution and vigenere cipher. We'll use the key MLK for both.

With a monoalphabetic substitution cipher (alphabet MLKABCDEFGHIJNOPQRSTUVWXYZ), we get the following ciphertext:

F EMVB M ARBMJ TEMT ONB AMY TEFS NMTFON WFII RFSB UP MNA IFVB OUT TEB TRUB JBMNFND OC FTS KRBBA: "WB EOIA TEBSB TRUTES TO LB SBIC-BVFABNT, TEMT MII JBN MRB KRBMTBA BQUMI." F EMVB M ARBMJ TEMT ONB AMY ON TEB RBA EFIIS OC DBORDFM, TEB SONS OC CORJBR SIMVBS MNA TEB SONS OC CORJBR SIMVB OWNBRS WFII LB MLIB TO SFT AOWN TODBTEBR MT TEB TMLIB OC LROTEBREOOA. F EMVB M ARBMJ TEMT ONB AMY BVBN TEB STMTB OC JFSSFSSFPPF, M STMTB SWBITBRFND WFTE TEB EBMT OC FNGUSTFKB, SWBITBRFND WFTE TEB EBMT OC OPPRBSSFON, WFII LB TRMNSCORJBA FNTO MN OMSFS OC CRBBAOJ MNA GUSTFKB. F EMVB M ARBMJ TEMT JY COUR IFTTIB KEFIARBN WFII ONB AMY IFVB FN M NMTFON WEBRB TEBY WFII NOT LB GUADBA LY TEB KOIOR OC TEBFR SHFN LUT LY TEB KONTBNT OC TEBFR KEMRMKTBR.

And, as expected, the IOC is exactly the same (1.8442). This is because when we calculate the index of coincidence, we don't take into account which letters are more/less frequent - so even when we change all of the letters to a different character, as long as each encrypted character represents just one decrypted character, the IOC will stay the same. This means that a ciphertext with an IOC around 1.73 is likely to be a monoalphabetic substitution of an English plaintext! Then all you need is a frequency analysis of the text and you should be able to crack it!

After encrypting the plaintext with a vignere cipher (key MLK), we get the following ciphertext:

U SKHP K PCOMX DTLD AYO PLI FSSE YKFTYZ HSXW BUDO GA KZO VUGO AFD FSO FCEQ XOMYSZR YR TDE NBQPN: "IP RAWN FSOEP DDFDTD DA MO EPVR-PFUOOZE, DTLD MWV YPX MCO OCOMEOP PAGLV." U SKHP K PCOMX DTLD AYO PLI AY DTP BQO RUWVE ZP SPYDRSM, ERQ DYZD YR QYDXOD DVMGOE LXP ERQ DYZD YR QYDXOD DVMGO AHXQCC ITVX MO MMVQ EY ETD PZGZ EYSPDTPB ME DTP DMMVQ ZP NCYFSODSYAO. S TLFQ L NDPKY ERME YZP NMJ OHPX FSO EEKFP YR XSEDSEDSBAS, M DDMEO EHOXEODTXS HSFS DTP RQLD AQ SZUEEESOP, CIPVFPBUYQ ITDT ERQ SOME YR ZZBCOEDSAY, GUWV NP DDLXEQYDXOP TXFZ KZ ZKETC AQ PDPOPZW MYN VFCFTMQ. T RMGO M OBQLW FSKF XI RZED WSFEVQ NRUWNDPX ITVX ZXQ OKK WSHP SZ L XMESAY GTPBQ ERQJ GUWV ZZD NP TGOQQO LK ERQ NYXZB AQ DTPSD DUUY LGE LK ERQ NYZEOZE YR ERQTB OSKDLMFPB.

The IOC for that ciphertext is 1.2681 - much lower than the original. This is because the vignere is a much more effective cipher which 'scrambles' all the letters far more, thus making it much more random and flattened. Since the characters have more even frequencies, the chance of picking two the same is lower, and thus so is the IOC.

FREQUENCY ANALYSIS

Expected frequencies

Frequency analysis usually only involves the letters of the alphabet (no special characters) and is not case sensitive. The table below shows the results of frequency analysis of a huge sample of English text, which can be compared with the frequencies of a ciphertext to help decrypt the message.

INDEX CHAR FREQ %

0 A 8.50
1 B 1.49
2 C 2.20
3 D 4.25
4 E 11.2
5 F 2.23
6 G 2.02
7 H 6.09
8 I 7.55
9 J 0.153
10 K 1.29
11 L 4.03
12 M 2.41
13 N 6.75
14 O 7.51
15 P 1.93
16 Q 0.095
17 R 7.59
18 S 6.33
19 T 9.36
20 U 2.76
21 V 0.978
22 W 2.56
23 X 0.150
24 Y 1.99
25 Z 0.077

Source: Wikipedia (rounded to 3 significant figures)

CIPHER TOOLBOXES

TOOLS LIST

INDEX OF COINCIDENCE CALCULATOR

What is the Index of Coincidence?

IOC for different languages

Example use of the Index of Coincidence

FREQUENCY ANALYSIS

Expected frequencies

Toolbox A1: Text Analysis

And more

TOOLS LIST

INDEX OF COINCIDENCE CALCULATOR

What is the Index of Coincidence?

IOC for different languages

Example use of the Index of Coincidence

FREQUENCY ANALYSIS

Expected frequencies