next up previous
Next: How we Found the Up: Stage 6 Previous: Initial Knowledge

Work Description

Initially we tried to solve this stage by hand. We made bigram statistics and tried to solve the cipher by considering it as a substitution cipher with $ {{}{25\atopwithdelims()2}}$ = 300 letters. Some observations could also be made that should facilitate the substitutions, e.g. that if XY is mapped to AB then YX is probably mapped to BA, etc. Our English skills were however not sufficient for breaking the cipher in this way. We also made some attempts at building the key square in an intelligent fashion. We tried to fill the top of the square with English text and then pad with the remaining letters in order. This was however also made without success. We thus decided to once again resort to brute force.

The idea we used was to make a heuristic search for the true key square. The program we built first evaluated all possible choices of the first six letters in the key square. Then it added one letter at a time and saved the partial keys that got the highest scores to the next round. We also built in the possibility to fix an initial portion of the key square and then proceed as above with the rest of the square. The crucial part of this approach is naturally to construct a good evaluation function.

Our evaluation function was based on bigram statistics. Using a massive amount of English text in which all in-frame double repeats such as AA were substituted with AXA, the frequencies of the different bigrams were counted and the log-probabilities (i.e., the natural logarithms of the probabilities of the different bigrams) were estimated. For a given partial square we first extracted all the plaintext bigrams for which both the plaintext bigram and the corresponding ciphertext bigram were found in the part of the key that was set. The mean of the log-probabilities for all of the pairs with log-probability above -9 were then computed for normalization purposes. Using the partial key, parts of the ciphertext were deciphered to plaintext. The sum of the log-probabilities of all the bigrams in the plaintext fragments were computed, and the score of the key was set to this value subtracted by the computed mean times the number of bigrams found in the plaintext fragments.


next up previous
Next: How we Found the Up: Stage 6 Previous: Initial Knowledge
solvers@codebook.org