February 2, 2021

Crack single-byte XOR encryption in GO

Breaking XOR

In this article we will look at how to implement a tool based on the “Go” programming language that can crack files or messages that are XOR encrypted. It is important to note that the tool is intended only for so-called “single-byte-encryption”. How we can break XOR encryption with keys, I’ll cover in the next article 🙂

What is XOR encryption anyway?

XOR or “Exclusive or” is a bitwise operation. The following example from Wikipedia (https://en.wikipedia.org/wiki/XOR_cipher) should explain everything.

For example, the string “Wiki” (01010111 01101001 01101011 01101001 in 8-bit ASCII) can be encrypted with the repeating key 11110011 as follows:

01010111 01101001 01101011 01101001
11110011 11110011 11110011 11110011
= 10100100 10011010 10011000 10011010

And conversely, for decryption:

10100100 10011010 10011000 10011010
11110011 11110011 11110011 11110011
= 01010111 01101001 01101011 01101001

And here we have a nice example for a single-byte-encryption. If the example is still unclear or contains too little explanation, I recommend reading the linked article on Wikipedia carefully.

What does “single-byte-encryption” mean?

Let’s assume that our encryption key can only be one character. E.g. the letter “X”. If we now XOR encrypt our file or text with the letter “X” we have implemented a single-byte-encryption. On Cyberchef you can super fast encrypt a message and (for better display) convert it to hexadecimal notation.

Recipe to attack

In this example, we basically rely on frequency analysis again. We want to analyze reference texts in the respective language, i.e. examine for bigrams and frequent words. This gives us the flexibility to analyze other languages. As a source for reference texts I recommend Project Gutenberg. If the site is annoying because of your IP address, just visit it via TOR.

The “machtingScore”

We have 10 of the most found biagrams and 10 of the most found words from our reference texts. We now collect the same data within each iteration from 0x00 to 0xFF and store both the current byte and the calculated machtingScore. After we have calculated all options, we output the top ten and see if our searched key was among them. With a very high probability, it is there. Now we only have to test our 10 keys and see which one makes the most sense.

Time for a short example

I received this message for testing from a friend.

11105f161b1a1e535f08171e0b5f0b17160c5f160c5f19100d535f1d0a0b5f1e1106081e06455f171a0d1a580c5f0b171a5f0b1a070b5f06100a58091a5f1e0c141a1b5f19100d51751c171a1a0d0c5e5f371e0d0d06

Let’s call our script

go run cmd/xor.go -m "11105f161b1a1e535f08171e0b5f0b17160c5f160c5f19100d535f1d0a0b5f1e1106081e06455f171a0d1a580c5f0b171a5f0b1a070b5f06100a58091a5f1e0c141a1b5f19100d51751c171a1a0d0c5e5f371e0d0d06"

and evaluate our results

So, our key is either ascii 127 (which is DEL) or one of the others. Let’s try ascii 95 (which is “_”)

Output: “NO.IDEA..WHAT.THIS.IS.FOR..BUT.ANYWAY..HERE.S.THE.TEXT.YOU.VE.ASKED.FOR.*CHEERS..hARRY”

And we decrypted our message without knowing which byte it was encrypted with. In the next article we will increase the difficulty and attack an XOR encryption with a key of length > 1. Cheers

Code

Here is the full source code and on github:

package main

import (
    "bufio"
    "encoding/hex"
    "fmt"
    "os"
    "regexp"
    "sort"
    "strings"

    "gopkg.in/alecthomas/kingpin.v2"
)

var (
    msg = kingpin.Flag("message", "message you want to attack. (in hex)").Short('m').Required().String()
)

type charAnalyseResult struct {
    value string
    count int
}

type msgAnalyseResult struct {
    value int
    count int
}

func check(err error) {
    if err != nil {
        panic(err)
    }
}

func cleanLine(s string) string {
    reg, err := regexp.Compile("[^a-zA-Z] ")
    check(err)
    s = strings.ToLower(s)
    s = reg.ReplaceAllString(s, "")

    return s
}

func processInputFile(path string) []string {
    file, err := os.Open(path)
    defer file.Close()
    check(err)

    scanner := bufio.NewScanner(file)
    res := []string{}

    for scanner.Scan() {
        res = append(res, cleanLine(scanner.Text()))
    }

    return res
}

func processContent(lines []string, bigramMap map[string]int, wordMap map[string]int) (map[string]int, map[string]int) {
    for _, line := range lines {
        words := strings.Split(line, " ")
        for _, word := range words {
            wordMap[word] = wordMap[word] + 1

            i := 0
            for i < len(word) {
                if (i + 1) < len(word) {
                    bigram := word[i:(i + 2)]
                    bigramMap[bigram] = bigramMap[bigram] + 1
                }
                i++
            }
        }
    }

    return bigramMap, wordMap
}

func getTopValuesFromMap(target map[string]int, top int) []charAnalyseResult {
    if top > len(target) {
        top = len(target)
    }
    values := make([]charAnalyseResult, 0, len(target))
    for key, value := range target {
        values = append(values, charAnalyseResult{key, value})
    }

    sort.SliceStable(values, func(i, j int) bool {
        return values[i].count > values[j].count
    })

    return values[:top]
}

func xor(a byte, b byte) byte {
    return a ^ b
}

func evaluateMessage(raw []byte, b byte) string {
    message := ""
    for _, rawB := range raw {
        c := string(xor(rawB, b))
        message += c
    }
    return message
}

func calcMatchScore(m string, topBigrams []charAnalyseResult, topWords []charAnalyseResult) int {
    mBigrams := make(map[string]int)
    mWords := make(map[string]int)
    score := 0

    m = cleanLine(m)

    processContent(strings.Split(m, " "), mBigrams, mWords)

    topMBigrams := getTopValuesFromMap(mBigrams, 10)
    topMWords := getTopValuesFromMap(mWords, 10)

    fmt.Println("--- evaluate message ---")

    for _, topBigram := range topBigrams {
        for _, topMBtopBigram := range topMBigrams {
            if topBigram.value == topMBtopBigram.value {
                score++
            }
        }
    }

    for _, topWord := range topWords {
        for _, topMWord := range topMWords {
            if topWord.value == topMWord.value {
                score++
            }
        }
    }
    fmt.Println("calculated score: ", score)
    return score
}

func main() {
    kingpin.Parse()

    bigrams := make(map[string]int)
    words := make(map[string]int)
    scoreList := []msgAnalyseResult{}

    refTextFiles := [2]string{"./data/the_adeventure_of_sherlock_holmes.txt", "./data/pride_and_prejudice.txt"}
    for _, file := range refTextFiles {
        fLines := processInputFile(file)
        processContent(fLines, bigrams, words)
    }

    topBigrams := getTopValuesFromMap(bigrams, 10)
    topWords := getTopValuesFromMap(words, 10)

    fmt.Println("--- generate reference values ---")
    fmt.Println(topBigrams)
    fmt.Println(topWords)
    fmt.Print("\n")

    fmt.Println("--- attack message ---")
    rawMessage, err := hex.DecodeString(*msg)
    check(err)

    for i := 0; i < 256; i++ {
        msg := evaluateMessage(rawMessage, byte(i))
        score := calcMatchScore(msg, topBigrams, topWords)
        scoreList = append(scoreList, msgAnalyseResult{i, score})
    }

    fmt.Println("top ten used keys -> try one of these")
    sort.SliceStable(scoreList, func(i, j int) bool {
        return scoreList[i].count > scoreList[j].count
    })

    for _, v := range scoreList[:10] {
        fmt.Printf("ascii int: %d \t %c \n", v.value, v.value)
    }
}

Leave a Reply

Your email address will not be published. Required fields are marked *