Re: [math-fun] [uk-imo-squad] A cipher challenge
Incidentally, does anyone know how to calculate the mean value of the standard deviation of set of n values if the n values are determined by a multinomial distribution. What I mean is, given a random sting of n characters of length l, is there an easy way to work out the expected standard deviation of the frequencies of the letters? (This is somewhat relevant to my attempted attack on Adam's cipher.) Andrew -----Original Message----- From: Andrew Carlotti <andrew@carlotti.me.uk> To: Adam P. Goucher <apgoucher@gmx.com> Cc: math-fun <math-fun@mailman.xmission.com>, uk-imo-squad@lists.maths.bath.ac.uk, Dave Greene <david.m.greene@gmail.com> Subject: Re: [uk-imo-squad] A cipher challenge Date: Sun, 10 Apr 2011 21:00:38 +0100 An interesting idea. I'm not sure about the practicalities of it - are you intending to draw out a grid for every set of five letters, or do you think it is something that you could do in your head. I think I've found one weakness already (although whether it can be put to use is another matter). There should be a positive correlation between the standard deviation of the binary digits or a character, and the standard deviation of the frequency of the character in each of the five positions mod 5. This occurs due to fact that in each of the five positions in the binary representation of a character, the two digits will usually not occur with equal probability, creating a bias towards characters with more of the more common digit, for that location in the ciphertext mod 5. That probably sounds a bit confusing, and I still need to check the technicalities. I also intend to see whether my conclusions from this are correct. (I won't say what they are here, in case someone doesn't want to know them.) Andrew -----Original Message----- From: Adam P. Goucher <apgoucher@gmx.com> To: math-fun <math-fun@mailman.xmission.com>, uk-imo-squad@lists.maths.bath.ac.uk, Dave Greene <david.m.greene@gmail.com> Subject: [uk-imo-squad] A cipher challenge Date: Sun, 10 Apr 2011 17:39:05 +0100 Dear all: I have developed a new cipher, which is an involution between strings. That is to say, the enciphering process is identical to the deciphering process. I designed the cipher to be relatively quick to encipher and decipher by a human (6 characters per minute), yet difficult to crack, even with the help of a computer. This makes it preferable over: * Vigenere and monoalphabetic ciphers, which are crackable. Certain monoalphabetic ciphers, such as Noah John Rondeau's code, can elude most people, including T. Rychlik (which is not the Linnaean name for a bacterium! : ). Nevertheless, Dave Greene cracked it in a couple of days, and I also managed it in a similar amount of time. Consequently, it is not secure enough to use for protecting secret information. * Enigma, Feistel etc., which require a machine to encipher. Moreover, the Feistel cipher only has 2^56 keys, so is within the reach of a brute-force computer search. Enigma also has inherent Achilles' heels, making it possible to decipher with computer assistance. -------- Anyway, here is a description of my crypto-system: To begin with, a key is required. The key can be any string with 32 or fewer characters; longer keys are more secure. For this example, we will use the key MATHFUN. Firstly, each alphabetic character in the key is converted into a number in the interval [1,26], like so: A = 1, B = 2, ... , Z = 26. Hence, MATHFUN = (13,1,20,8,6,21,14). This sequence is used to generate the encoding matrix. The encoding matrix is an 8 * 4 array of squares, like so: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The first number is 13, so we place 'A' in the 13th box: _ _ _ _ _ _ _ _ _ _ _ _ A _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The next number is 1, so we place 'B' in the next box along: _ _ _ _ _ _ _ _ _ _ _ _ A B _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Then 20. 'C' goes in the box 20 spaces ahead, like so: (There is a helical-toroidal-style wrap-around effect.) _ C _ _ _ _ _ _ _ _ _ _ A B _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The next number is 8: _ C _ _ _ _ _ _ _ D _ _ A B _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The next number is 6, so we move 6 *unoccupied* spaces ahead: _ C _ _ _ _ _ _ _ D _ _ A B _ _ _ E _ _ _ _ _ _ _ _ _ _ _ _ _ _ And 21: _ C _ _ _ _ _ F _ D _ _ A B _ _ _ E _ _ _ _ _ _ _ _ _ _ _ _ _ _ Finally, 14, so we move 14 unoccupied squares ahead and add 'G': _ C _ _ _ _ _ F _ D _ _ A B _ _ _ E _ _ _ _ _ _ _ G _ _ _ _ _ _ Now, we fill in with the rest of the alphabet, starting at the top-left, and moving *vertically* downwards: H C L P T W Z F I D M Q A B ! J E N R U X . ? K G O S V Y , - Note that the 'alphabet' is augmented with " .,!?-", to make the total of 32 characters necessary. Indeed, my cipher works for any power of two, but 32 is the next power of two above 26. With each element, associate a binary string, like so: H = 00000 C = 00001 L = 00010 ... F = 00111 I = 01000 D = 01001 ... A = 01100 ... G = 11001 ... , = 11110 - = 11111 To encode a string, we separate it into blocks of five characters. Each block is encoded with a 5*5 array of bits. For example, let's consider the string GAUSS. G = 11001, so fill in the first column of a 5*5 array with 11001: G 1 _ _ _ _ 1 _ _ _ _ 0 _ _ _ _ 0 _ _ _ _ 1 _ _ _ _ And the next column with A = 01100: G A 1 0 _ _ _ 1 1 _ _ _ 0 1 _ _ _ 0 0 _ _ _ 1 0 _ _ _ Eventually, we will have: G A U S S 1 0 1 1 1 1 1 0 1 1 0 1 1 0 0 0 0 0 1 1 1 0 0 1 1 The first row, 10111, corresponds to ?: G A U S S 1 0 1 1 1 ? 1 1 0 1 1 0 1 1 0 0 0 0 0 1 1 1 0 0 1 1 And 11011 corresponds to S: G A U S S 1 0 1 1 1 ? 1 1 0 1 1 S 0 1 1 0 0 0 0 0 1 1 1 0 0 1 1 Proceeding in this manner yields: G A U S S 1 0 1 1 1 ? 1 1 0 1 1 S 0 1 1 0 0 A 0 0 0 1 1 P 1 0 0 1 1 R So, GAUSS enciphers to ?SAPR. You can verify that ?SAPR enciphers to GAUSS, and indeed that encipherment is self-inverse in the general case. -------- Eight Enjoyable Exercises (in ascending order of difficulty): 1. Decipher CIHTXG ZYGRI BU with the key MATHFUN. (15 characters; the two spaces are important.) 2. How many possible encoding matrices are there? (Ignore the fact that certain matrices do not correspond to actual keys.) 3. Find the encoding matrix for the key FERMATSLASTTHEOREM. 4. Decipher Z?TQN with the key FERMATSLASTTHEOREM. 5. Of the 33554432 possible five-character strings, how many map to themselves? (This is irrespective of the key.) 6. Give an example of a five-character string which maps to itself with the key MATHFUN. 7. Decipher this with an unknown key: (420 characters; it is easiest to view in a monospaced font such as in Notepad.) BNXUHJIG!RFSKRDQBBFI KIOYCLOQS.SAMLPDV?UXHE,STV?IL?FVXBX.QHF N.I!RKZ?CO AFUKMIW!UKR,KYU,MVXS?E.XO,CVU?LCLI-TMXIX-K.FTVVWN C-WTI TASD!OQM ? KYYZWBN-Z-NACLGHMJYVQWSQCU GJBBOG MIOYFN?EJ U,MVXJCU.GZFFLQL.FHTOGIZUVFMORMVHCU-FBANWTJROO.I!QNTB ,.NIOC NDVZRCUNTJT,EM.FIJC-QVC!U -OA RMGOB!KMP-O!CZS,DEGMSCP VMRYAV RC,CM ?BU?IYQDSAOBLS-PAKMW!FFZCQPREY ,FU?ZWSWUP-WRDKNX,VBN- A-XPM,LKY!LRF,!I.HAIO !ILUVINO!ALEK!DLEZJ-UOQDFFGHTY DS?OCGE 8. What was the original key used for the message in exercise 7? -------- Have fun! Sincerely, Adam P. Goucher
Andrew, Interesting analysis. There should indeed be a statistical bias towards one of the two possible binary digits for each of the five positions (modulo 5), but it is very slight. 16 characters share any particular digit in any particular location, so the bias will be significantly dampened. (Choose 16 characters at random, add together the probabilities, and it should result in a value rather close to 50%.) I would particularly like to see your statistical attack. As for your question, I'm sure someone on math-fun will be able to answer it. For the benefit of people on math-fun (who cannot view messages sent by people not on the list), the messages are reproduced below:
Incidentally, does anyone know how to calculate the mean value of the standard deviation of set of n values if the n values are determined by a multinomial distribution. What I mean is, given a random sting of n characters of length l, is there an easy way to work out the expected standard deviation of the frequencies of the letters? (This is somewhat relevant to my attempted attack on Adam's cipher.)
Andrew
An interesting idea. I'm not sure about the practicalities of it - are you intending to draw out a grid for every set of five letters, or do you think it is something that you could do in your head.
I think I've found one weakness already (although whether it can be put to use is another matter). There should be a positive correlation between the standard deviation of the binary digits or a character, and the standard deviation of the frequency of the character in each of the five positions mod 5. This occurs due to fact that in each of the five positions in the binary representation of a character, the two digits will usually not occur with equal probability, creating a bias towards characters with more of the more common digit, for that location in the ciphertext mod 5.
That probably sounds a bit confusing, and I still need to check the technicalities. I also intend to see whether my conclusions from this are correct. (I won't say what they are here, in case someone doesn't want to know them.)
Andrew
Sincerely, Adam P. Goucher
participants (2)
-
Adam P. Goucher -
Andrew Carlotti