IoC Analysis on Interrupts

What is this?

This database consolidates the best, aka. highest Index of Coincidence (IoC) scores, for any given interrupt – considering all possible interrupt constellations. We look at the first 20 interrupts only, and try all combinations for these. But instead of looking at the whole chapter we only look at the text upto interrupt no. 21. This way we can test all the possibilities and, in case we have the right key length, find the key length with the highest probability. Since we tried all combinations for this shorter text, the complete text will be fully decrypted.

Example:

Input: ᛁ ᚹᚪᚱᚾ ᚣᚩᚢ ᛁᚠ ᚣᚩᚢ ᛞᚩᚾ ᛏ ᛏᛖᛚᛚ ᛗᛖ
Interrupt: ᚩ
Interrupt-limit: 2 (with a limit of 3, the full string would be considered)
IoC analysis on: ᛁ ᚹᚪᚱᚾ ᚣᚩᚢ ᛁᚠ ᚣᚩᚢ ᛞ

Is it enough information though? Mostly. The three lowest examples have 349, 364, and 376 runes respectively. Meaning that, in the worst case, the frequency analysis will look at only 349 runes. For a key length of 25 it will leave only 14 runes per group. That is not very much but the best we can get. You could increase the interrupt count to 21 or 22 which would make it better but the execution time doubles¹ with each increment.

¹ testing 20 interrupts takes approx. 38 hours (pages 0–55 with all interrupt runes). Or 30 seconds for a single test.

Assumptions

“Normal” english text: IoC is based on the assumption that we know the underlying text (english) and that the text follows a normal character distribution. If the text was prepared to be extra hard to decrypt, one could have removed all letter ‘e’ to make IoC pretty much useless (there are a few 100+ pages books that do exactly that). ⤳ Well, let us hope we have normal texts.
Mono- and polyalphabetic substitution: Each encrypted rune has a 1-to-1 mapping to its decrypted counterpart relative to its group. For polyalphabetic ciphers the groups are determined by cycling through different substitution alphabets. The number of groups is from now on described as key length.
It is completely irrelevant whether the encryption algorithm uses a Ceaser shift (variant), Atbash, Vigenere, or an Affine substition, as long as it is monoalphabetic (in its group), the IoC will stay the same. ⤳ The results do not apply to polyphonic or polygraphic ciphers.
Single rune keys: If a polyalphabetic cipher is used, we assume the decryption is based on this rune alone. E.g., it does not look at the neighboring rune, nor words, nor its position in the text. Further, the decryption takes only one rune as input. ⤳ We can not detect bi-gram or tri-gram substitions or totient streams.
Key length: We only consider key lengths of up to 32 runes. Longer keys will split the text too much, leaving too little data per group to analyze IoC. Even 32 is probably too high in most cases. You can see that in the results that the IoC values for longer key lengths have a tendency to be higher. Keep in mind that a key length of 30 on a text with just 300 runes will be a mere 10 runes per key group.
That said, there is still a reason for why we go up to 32 runes. Shorter keys will have, so to say, resonance frequencies. For example, a key length of 8 will have a similar IoCs for key lengths of 16 and 24 as it is just a multiple of 8. ⤳ Don’t focus too much on high IoC values at the upper key length limit unless it is a multiple of a shorter one.
Whitespace: IoC does not care about whitespace, at least not in this analysis. Both, training data and LP paged were stripped of any whitespace before calculating the IoC. This means, if the given whitespace should be bogus, the IoC value would still be higher compared to other key lengths. ⤳ Whitespace does not affect the results.
One cipher per chapter: So far we assumed that each chapter (grouped by its page artwork) has exactly one cipher. Thus, each IoC is calculated based on the entire chapter rather than per page. This gives more data for frequency analysis but will fail if the chapter should have more than one cipher (or change midway). ⤳ Will not detect if a chapter has multiple ciphers (e.g., one per page, sub-chapter, or line)
Order of decryption: We assum the decryption starts at the beginning of a page. Even though this should not matter for IoC, since a mere revert would not change the frequency, it matters for the interrupt positions. Since we only look at the first X runes, the IoC of a reversed stream may be different. Further, the interrupt positions will not help you if you need to start from the back. ⤳ Reverse order does not change IoC, hence this results are also applicable.

Reliability

The following table shows how many runes were considered while analyzing the IoC. Low value, low confidence. The darker the color is, the higher the chances are the results are accurate. Everything below 384 is far from ideal (16 runes per key group for a key length of 24). Everything above 812 is considered reliable (29 runes per key group for a key length of 28). Hence, values less than 384 have a white background and values above 812 have a dark blue one.

__TAB_RELIABLE__

IoC per interrupt

Lets look at the first result. Assuming the interrupt rune is ᚠ, we get the following table. Notice that the column in ‘p56_an_end’ has a few dark values, even though the used cipher is a totient function (which will not be detected as of assumption #2). If you look back at the previous table, you will see that the whole chapter only has 85 runes. Even for a key length of 6 the runes per group is only 14 runes. It is just too little data to perform IoC and so it will contain false positives.

Next, if you look at the ‘0_welcome’ column you will see peaks at key lengths 8, 16, and 24. The solution to this page was a 8-length vigenere key. This is a very typical pattern for such ciphers. Note, the last peak at 30 is due to the long key length. 465 / 30 is just 15.5 runes per group. So, IoC has more freedom to “optimize” the key – and thus you should stop looking too much into higher value key lengths. Everything above 24 is not that reliable anymore.

P.S.: you can use the left and right keys to navigate between the interrupts. Or the navigation at the top.

__INTERRUPT_TABLES__

What’s next?

Things to try:

Use different IoC metrics. E.g., remove ‘e’ from alphabet and recalculate coincidence.
Split text into two (alternating) parts and test each part separately on different key lengths.
Not sure if it makes sense to analyze bigrams and trigrams in this case but feel free to try.