January 8, 2021

Attacking Vigenere Cipher

Intro

In this post, we’ll look at Vigenere encryption. We implement the method in Python and look at the Kasiki attack method. Then we try to restore the original key and decrypt the message.

Attacking the Vigenere Cipher

The Vigenere encryption method dates back to the 15th century and was considered unbreakable for a long time, until the 18th century. It was not until 1863 that a German infantryman named Friedrich Wilhelm Kasiski came up with a proposal that would make it possible to attack the Vigenere encryption without knowledge of the key. The procedure is known today as the Kasiki test. You can read more about the Vigenere cipher in general here.

Encrypt with Vigenere

To encode a text we need an alphabet and write all rotations below each other. This creates the so-called Vigenere square. Then we calculate the square with our key. The secret message “this is our secret message” compined with the key “tumble” leeds to the encrypted message “MBUTTWHODTPGKYFNPWLUSF”. I hope the graphic here makes it a little clearer what is happening.

Now that we know how the encryption works we can do an implementation in python for better understanding. Here is my python script. The script can be found on github here.

import sys
import argparse
import string
import re

UPPER_DICT = list(string.ascii_uppercase)

def get_rotated_alphabet(r):
    return UPPER_DICT[r:] + UPPER_DICT[:r]

def build_vigenere_square():
    v_square = []

    counter = 0
    for _ in string.ascii_uppercase:
        v_square.append(get_rotated_alphabet(counter))
        counter += 1

    return v_square

def parse_message(message):
    message = message.replace("\n", "")
    message = message.upper()
    return re.sub(r'\W', '', message)


def encrypt(message, key_length, key):

    v_square = build_vigenere_square()
    enc_message = ""

    for i in range(0, len(message), key_length):
        fragment = message[i:i+key_length]
        for a in range(0, len(fragment)):
            try:
                enc_char = v_square[UPPER_DICT.index(key[a].upper())][UPPER_DICT.index(fragment[a])]
                enc_message += enc_char
            except:
                enc_message += fragment[a]

    return enc_message

def main():
    try:
        parser = argparse.ArgumentParser()
        parser.add_argument("-s", "--secret", dest="secret", required=True, help="your secret word to encrypt")
        parser.add_argument("-m", "--message", dest="message", required=True, help="the message you want to encrypt")
        
        args = parser.parse_args()

        print(encrypt(parse_message(args.message), len(args.secret), args.secret))

    except Exception as e:
        print(e)
        sys.exit(1)

if __name__ == "__main__":
    main()

We can now test the script by entering the following command into our terminal.


Decrypt with vigenere

Now that we know the encryption works we can implement the decryption relatively easily. But first we will look at the theory of how this works. To decode a encrypted message we need again our Vigenere square. Then we calculate the square with our key. The encrypted message “MBUTTWHODTPGKYFNPWLUSF” compined with the key “tumble” leeds to the decrypted message “THISISOURSECRETMESSAGE”. I hope the graphic here makes it a little clearer what is happening.

Now that we also know we can decrypt a message, we implement this in python as well. Code is also here.

import sys
import argparse
import string
import re

UPPER_DICT = list(string.ascii_uppercase)

def get_char_frequencies(encrypted_text):
    frequency_char_object = {}
    for char in encrypted_text:
        if ord(char) != 32:
            if char in frequency_char_object:
                frequency_char_object[char] += 1
            else:
                frequency_char_object[char] = 1

    return sorted(frequency_char_object.items(), key=lambda item: item[1], reverse=True)

def rotate(char, key):
    try:
        origin = UPPER_DICT.index(char.lower())
        target = origin + key

        return UPPER_DICT[(target % len(UPPER_DICT))]
    except:
        return char

def get_rotated_alphabet(r):
    return UPPER_DICT[r:] + UPPER_DICT[:r]

def build_vigenere_square():
    v_square = []

    counter = 0
    for _ in string.ascii_uppercase:
        v_square.append(get_rotated_alphabet(counter))
        counter += 1

    return v_square

def parse_message(message):
    message = message.replace("\n", "")
    
    message = message.upper()
    return re.sub(r'\W', '', message)


def decrypt(message, key_length, key):

    v_square = build_vigenere_square()
    dec_message = ""

    for i in range(0, len(message), key_length):
        fragment = message[i:i+key_length]
        for a in range(0, len(fragment)):
            try:
                dec_char_key_row = v_square[UPPER_DICT.index(key[a].upper())]
                dec_char_index = dec_char_key_row.index(fragment[a])
                dec_char = UPPER_DICT[dec_char_index]

                dec_message += dec_char
            except:
                dec_message += fragment[a]

    return dec_message

def main():
    try:
        parser = argparse.ArgumentParser()
        parser.add_argument("-s", "--secret", dest="secret", required=True, help="your secret word to decrypt")
        parser.add_argument("-m", "--message", dest="message", required=True, help="the message you want to decrypt")
        
        args = parser.parse_args()

        print(decrypt(parse_message(args.message), len(args.secret), args.secret))

    except Exception as e:
        print(e)
        sys.exit(1)

if __name__ == "__main__":
    main()

The Kasiki Test

In order to attack the Vigenere encryption in any way, we need as accurate a statement as possible about what potential key length was used. And this is where the Kasiki method comes in. First of all we look for the accumulations of more than 2 same repetitions. A short example. We are going to work with the following message:

“The house introduced in the introduction is owned by Arthur Dent, who is a totally normal human being. That is, just like the rest of us, his house is in the way of a planned road that the local government wants to build. In fact, Arthur starts off the day with a hang over, just like the rest of u…wait, what?, because he was down at the pub last night, drinking and complaining about how the local gov ernment wants to bulldoze his house.Which explains why there are bulldozers outside his house right now. So, Arthur goes out and lies down in front of a bulldozer. Pretty much your average Thursday. The man in charge of the bulldozers is Mr. L. Prosser, who is a dista nt descendant of Genghis Khan. Like Genghis Khan, Prosser has a bit of a belly and likes fur hats. Unlike Genghis Khan, Prosser is ner vous and not so good at conquering” – Read more about the text here!

If we now encrypt this message with the key “tumble” we get the following cipher text.

“MBQIZYLYUOEVHXGDPHBHFIPMGNDPOYVNUPYMLIIOPHUSMSELNLPFYXPBAJDEMIFBWPRHASXEEBGNLRUYUORXAUFJDNNMFMTOXNTFCILNAGFWACEIZYLYUTTRMBQXLCHZMQWEGHQECSTXFILXMBQMZGTFSPGIKHYFYXPUZUDXHVGJWHBHRBNXTLFIFVLNMSEWHZRUSIWUKXTXAUTBYKHPQSUYLNXJVIMBQSPWMIRVHEBNIILXUYOBFWXBQXLWWIIOLXMBQQFFEUEUYMZBFECMGEUOREGXOPXTEUUOTRZUNPFXAIIUSIEIOBWKHPQSYQXHFXLRMMFPMYEFPPKIACEIZYLYIITGAYJQWEBHEXSCMBQSPEKYNVWPWILFCWHOFTTHXBUTSSNMQSTKANZPHWHUDUSYKAAFDSNNMOOPBYEEZAGCZGCSGNAGLFNFXEZDXLBSPXMSYVNLRIGSLZXLMHPXAODTOERNTFXEGCZDSEKAQPQXAYNVWPWILFCWBMYSWTKIETPVPBAJDEWCEULRMXQTNIGXMOESYAQORLBMWILRECWFRIGATJDOAUZQCSLMQSSELUNJESYUNFWPRUZEWMDYEGFVAUFTFRECWFRIGATJDOAUZQCSLMQSTWGYDWZYLUZEYSMMAHZSWUFDZRJOQSTRZ”

Let’s search for trigramms:

MBQIZYLYUOEVHXGDPHBHFIPMGNDPOYVNUPYMLIIOPHUSMSELNLPFYXPBAJDEMIFBWPRHASXEEBGNLRUYUORXAUFJDNNMFMTOXNTFCILNAGFWACEIZYLYUTTRMBQXLCHZMQWEGHQECSTXFILXMBQMZGTFSPGIKHYFYXPUZUDXHVGJWHBHRBNXTLFIFVLNMSEWHZRUSIWUKXTXAUTBYKHPQSUYLNXJVIMBQSPWMIRVHEBNIILXUYOBFWXBQXLWWIIOLXMBQQFFEUEUYMZBFECMGEUOREGXOPXTEUUOTRZUNPFXAIIUSIEIOBWKHPQSYQXHFXLRMMFPMYEFPPKIACEIZYLYIITGAYJQWEBHEXSCMBQSPEKYNVWPWILFCWHOFTTHXBUTSSNMQSTKANZPHWHUDUSYKAAFDSNNMOOPBYEEZAGCZGCSGNAGLFNFXEZDXLBSPXMSYVNLRIGSLZXLMHPXAODTOERNTFXEGCZDSEKAQPQXAYNVWPWILFCWBMYSWTKIETPVPBAJDEWCEULRMXQTNIGXMOESYAQORLBMWILRECWFRIGATJDOAUZQCSLMQSSELUNJESYUNFWPRUZEWMDYEGFVAUFTFRECWFRIGATJDOAUZQCSLMQSTWGYDWZYLUZEYSMMAHZSWUFDZRJOQSTRZ”

Now we have to measure the distance between each appearance.

  • 120
  • 24
  • 78
  • 36

After that, we factoring all these numbers. excluded the number 1.

1202, 3, 4, 5, 6, 40, 8, 10, 12, 15, 20, 24, 60, 30
242, 3, 4, 6, 8, 12
782, 3, 6, 39, 13, 26
362, 3, 4, 6, 9, 12, 18

Now we need to count with divisor occurres the most:

(4*2), (4*3), (4*6) – This indicates that our key has a length of either 2, 3 or 6. And remember our key “tumble”, it has exactly 6 characters.

Split message in parts of possible key length

If we now going to split our message with key of length 6 we need to get every 6th letter. For example: 1,7,13 – 2,8,14 – 3,9,15 – 4,10,16 – 5,11,17 – 6,12,18 … After separating our message with have the following parts:

1MLHBGVLUNPMREUANXLALMHGTMTKPHBTLHWAHLMMBUXWMEZGGEZAEHXMEALABMKWHXNAHKNBGGNXMRXARGKAWBKPWMGYBEGALLYRDAEGALGLMWJZ
2BYXHNNISLBIHBYUMNNCYBZHXBFHUVHLNZUUPNBINYBIBUBEXUUIIPHMFCYYHBYIOBMNUANYCNFLSILONCAYIMIBCXXAMCAUMUUUYUCAUMYUMUO
3QUGFDUIMPAFAGUFFTAEUQMQFQSYZGRFMRKTQXQRIOQIQEFUOUNIOQFFPEIJEQNLFUQZDAMEZAXBYGMDTZQNLYEAEQMQWWTZQNNZEFWTZQDZAFQ
4IODIPPOSFJBSNOJMFGITXQEIMPFUJBISUXBSJSVIBXOQUEOPOPUBSXPPIIQXSVFTTSPUFOEGGESVSHTFDPVFSTJUTOOIFJQSJFEGTFJQSWEHDS
5ZEPPOYPEYDWXLRDTCFZTLWCLZGYDWNFESTYUVPHLFLLFYCRXTFSWYLMKZTWSPWCTSTHSDOZCLZPNLPOXSQWCWPDLNERLRDCSEWWFFRDCTZYZZT
6YVHMYMHLXEPERXNOIWYRCESXGIXXHXVWIXKYIWEXWWXFMMETRXIKQRYIYGECEPWHSKWYSPASFDXLZXEEEXPWTVERISLRIOSESPMVRIOSWYSSRR

Attacking with Frequency Analysis

The Vigenere method is basically a shift cipher (Caeser cipher), except that it only uses multiple subkeys. With our calculated key lengths we can now try to crack the individual parts with the help of the frequency analysis. Now we need a way to measure something like a score. Take a look at the english letter frequency https://en.wikipedia.org/wiki/Letter_frequency We can try which rotation on our message blocks comes closest to the English letter distribution. After that we can form a permutation over all letters. We then use this to decrypt our message and see which key fits.

Calculating of a hit score

We shift each text block with the Caeser Cipher through all 26 positions. After each rotation we calculate how close we are to the normal distribution of the English language. For this purpose, we sort each rotated fragment according to the frequency of the occurring letters. After that we test the first 5 letters if one of them is equal to the 5 most frequent English letters (E, T, A, O, I, N). For each match we give 1 point.Have a look at the table here. With that in mind we can now implement our calculation script.

import re
import math

TOP_ENGLISH_LETTERS = ["E","T","A","O","I","N"]

def clean_text(text):
    return re.sub(r'\W', '', text)

def get_letter_map(text):
    letters = {}
    text = clean_text(text)
    for c in text:
        if c not in letters:
            letters[c] = 1
        else:
            letters[c] = letters[c] + 1
    return sort_dict(letters)

def sort_dict(target):
    return sorted(target.items(), key=lambda item: item[1], reverse=True)

def get_score(letter_dict):
    head = letter_dict[0:5]

    score = 0

    for el in head:
        for ch in TOP_ENGLISH_LETTERS:
            if str(el[0]).upper() == ch:
                score += 1
    return score

Breaking the cipher

Then we take the most common letters and generate all possible combinations with the calculated key length. The final attack script looks like follow and is available here.

import sys
import argparse
from core import v_lib
import itertools

MIN_KEY_SIZE = 3
MAX_KEY_SIZE = 7


def main():
    try:
        ########################################################################################
        ### PREPARE CIPHER TEXT
        ########################################################################################
        parser = argparse.ArgumentParser()
        parser.add_argument("-m", "--message", dest="message",
                            required=True, help="the message you want to decrypt")

        args = parser.parse_args()
        parsed_message = v_lib.parse_message(args.message)


        # search for multiple fragments in cipher text
        multiples = v_lib.search_occurences(parsed_message)

        # get positions from found multiples in cipher text
        positions = []
        for m in multiples:
            for word in m[1]:
                positions.append(v_lib.search_positions(parsed_message, word))

        # calculate each text distance between each position from a text multiple
        distances = v_lib.calculate_distances(positions)


        # factorise all distances
        fact_arr = []
        for dis in distances:
            fact_arr.append(v_lib.factorise(dis))


        # clean factorise numbers
        frequent_numbers = v_lib.get_most_frequent_number(fact_arr)


        ########################################################################################
        ### INVESTIGATE POSSIBLE KEYS
        ########################################################################################

        key_map = v_lib.get_sorted_key_map(frequent_numbers)

        for key in key_map:
            if key[0] > MIN_KEY_SIZE and key[0] < MAX_KEY_SIZE:
                rotation_score = v_lib.calculate_rotation_score(v_lib.get_message_parts(parsed_message, key[0]))

                key_char_map = set()
                for rs in rotation_score:
                    possible_keys_fragments = v_lib.get_possible_key_elements(rs)
                    for char in possible_keys_fragments:
                        key_char_map.add(char)

                print("possible chars for key with length: {}".format(key[0]))
                print(key_char_map)

                print("possible keys are:")
                a = itertools.permutations(list(key_char_map), key[0])  
                y = [''.join(i) for i in a]
                for possible_key in y:
                    print(possible_key)
  

    except Exception as e:
        print(e)
        sys.exit(1)


if __name__ == "__main__":
    main()

After scanning the output for out key “TUMBLE”:

You see we have a hit. In the last step we can now test all combinations and check if there are “real” English words in the sentence. We can filter these keys again and get closer and closer to the key we are looking for.

Summary

With this example we have seen that even an encryption with several alphabets offers no protection against a frequency analysis and can therefore be cracked (admittedly with some effort). I hope you had fun!

Leave a Reply

Your email address will not be published. Required fields are marked *