Work Description

Next: How we Found the Up: Stage 8 Previous: Initial Knowledge

Work Description

Clearly the number of possible configurations to try is huge. The number of degrees of freedom is 2 x 26⁵ = 2^.23762752 for the rotation direction, wheels and rings, and a whopping ${{}{26\atopwithdelims()2}}$ ^. ${{}{24\atopwithdelims()2}}$ ^. ${{}{22\atopwithdelims()2}}$ ^. ${{}{20\atopwithdelims()2}}$ ^. ${{}{18\atopwithdelims()2}}$ ^. ${{}{16\atopwithdelims()2}}$ = 72282089880000 for a plugboard with six pairs of exchanges.

After inspecting the construction of the Enigma, the weakest component is clearly the plugboard. Only 12 letters are modified by the plugboard, so when encrypting a random character there is a fair chance, (14/26)² $\approx$ 29%, that it is not modified by any of its two passages through the plugboard. This is a glaring weakness, and a key feature of our solution is that we primarily search among the 23762752 wheel and ring settings.

So the main idea is to try all the possible wheel and ring settings, and to try to locate the best one. How should this be done, when we do not even know what the plugboard contains? The idea is as follows. For all the 23762752 possible wheel and ring settings, the following is repeated: Suppose the message has the following boring form: EEEEEE... Each E will first pass through the plugboard, and it is there mapped onto one of 26 possible characters. If we knew what character it is mapped on, we can now trace how the E is modified by the plugboard on the way in, then by the wheels, the reflector and the wheels again. Only the final pass through plugboard on the way out is unknown. Now that we have also guessed what E is mapped on, there are 26^.23762752 possible strings that can be produced from the supposed plaintext EEEEEE... with the given wheels. A key idea is now that E is one of the most common letters in many languages: English, German, French, Swedish etc. It is therefore reasonable to assume that more than 1/26 of the letters in the actual plaintext are Es, so for each fixed guess we determine the plugboard settings which makes the text we get when we encrypt the string EEEEEE... coincide as well as possible with the actual ciphertext. This can be done by solving a bipartite matching problem (see any textbook on graph algorithms for an explanation of what this is). The algorithm for this is fairly standard, and as it would be executed a lot of times, some care was taken to make it efficient. As we expect E to be fairly common, the string EEEEEE... should behave a like the plaintext in the sense that we expect good plugboard settings to come out of this process.

It may appear as if the approach rests upon the assumption that E is the most common letter in the plaintext. This is really not the case; if E is rare in the plaintext, the approach is still basically sound, but the produced plugboard may be a bit off.

Solving the 2^.26⁶ different bipartite matching problems may seem like a large computing effort, but the program was capable of doing this in about a CPU day on a modern PC. (This could have been cut by another factor of two if we had not tried two different interpretations of the wheel permutations, but better safe than sorry.)

So for each of the 2^.26⁶ guesses, we can find the best plugboard. But how can we find the correct guess? Notice that with the plugboard determined from the solution to the bipartite matching problem, everything is known about the state of the Enigma for each guess: The wheel settings, the ring settings, and the plugboard. But this means that we can decrypt the ciphertext for each guess! This gives 2^.26⁶ candidate plaintexts, and all that remains is to choose the best one of these -- Singh's plaintext, to be precise. A human doing this would quickly get bored to the point of falling asleep, but it is an easy task for a computer program. The final design choice is how to measure the likelihood of a candidate plaintext to be the correct one. There were several alternative ways of doing this. As the Enigma was used, we guessed that the plaintext would be in German. By making this assumption, frequency tables for the letters in German text could be used. Another possible approach, slightly less efficient but not making any assumption about what language was being used, is as follows: In all languages, there are some letters which are a lot more frequent than the average 1/26 (e.g. E, T and N in English) and some that are a lot less frequent (e.g. J and Q in English). Thus, the plaintext with the most skewed frequency distribution is likely to be the right one. As the ciphertext was rather short, we decided to use the first approach as it seemed likely to be better at discovering the correct plaintext using the limited information available.

Next: How we Found the Up: Stage 8 Previous: Initial Knowledge

solvers@codebook.org