February 12, 2021

Crack XOR Encryption (repeating key)

Crack XOR Encryption

In one of my last articles I showed how to crack a single-byte XOR encryption quickly and easily. Click here for the article. And for some exchange on reddit about that stuff, please click here. And for the source code, here.

But now let’s look at how we can crack a message that has been xor encrypted with a repeating key. To do this, we need to perform the following steps:

  • find the potential length of key
  • divide the encrypted message into equal sized blocks according to the key length
  • bruteforce the potential characters that was used in the orignial key

Let’s start by finding the original key length

Find the potential length of key

There is a good way to infer the original key length from the Hamming distance. But first let’s clarify what the Hamming Distance is, and how we calculate it.

The Hamming Distance

In really short words: It calculates the bit distances two objects. For example two strings with same length, two bytes, two floats,… – you know what I mean.

Short example

What is the Hamming Distance from “a” and “b” ? So, “a” in ASCII is 97 and “b” is 98 (decimal). Let’s take a look into the bits.

You can quickly get the bit difference of these two bytes by XOR them. The last thing we need to do is count how many bits are true(1). In this case, the Hamming Distance from “a” and “b” is 2.

Implementation in GO

// Compute Hamming Distance from two bytes
func HammingByteDistance(a byte, b byte) int {
    c := 0
    r := a ^ b
    for i := 0; i < 8; i++ {
        if (r & 0x01) == 1 {
            c++
        }
        r = r >> 1
    }

    return c
}

// Compute Hamming Distance of two strings
func HammingStringDistance(a string, b string) int {
    if len(a) != len(b) {
        log.Fatal("error in calc hamming distance from string. length of strings are not equal")
    }

    d := 0
    for i := 0; i < len(a); i++ {
        d += HammingByteDistance(a[i], b[i])
    }

    return d
}

Guessing the right key-length

With our hamming distance implementation we are now able to guess, what could be the possible key-length. Let’s assume that our key-length is at least greater than then 1 (otherwise, you know, it’s single-byte-xor…) and less than e.g. 10.

The “only” thing we have to do now is to test in a loop all possible key lengths and choose the one where the hamming distance is the smallest in relation to the key length.

// Input the message and m and the key-length kl
func ProcessInputMessage(m string, kl int) float32 {
    mf := []string{}

    splitMessageIntoParts(m, kl, &mf)
    return discoverMessageParts(mf, kl)
}

// Split the message in to parts, depending on the key-length l
func splitMessageIntoParts(m string, l int, fragments *[]string) {
    for i := 0; i < len(m); i += l {
        if (i + l) < len(m) {
            *fragments = append(*fragments, m[i:i+l])
        }
    }
}

// calculate the hamming distance for each part
func discoverMessageParts(p []string, kl int) float32 {
    s := 0
    for i := 0; i < 6; i++ {
        if (i + 1) < len(p) {
            s += HammingStringDistance(p[i], p[i+1])
        }
    }

    return (float32(s) / float32(kl))
}

Small note: Why do we only charge up to a limit of 6 (in the “discoverMessageParts” function) ? It has been shown that a value between 4 and 7 gives the best results.

Why: Good question, if you have an idea, write me!

Anyway, we are able to calculate a score for each key length (kl) and store the return value in a map.

    keyMap := make(map[int]float32)

    ms := string(m)

    fmt.Println("process input message")
    for i := 2; i < *maxKeyLength; i++ {
        keyMap[i] = core.ProcessInputMessage(ms, i)
    }

And than, with a little helper implementation, we can extract the top 5 possible key-length’s from our map.

func GetTopValuesFromKeyMap(target map[int]float32, l int) []KeyScore {
    tv := make([]KeyScore, 0, len(target))

    for k, v := range target {
        tv = append(tv, KeyScore{k, v})
    }

    for i, v := range tv {
        for a, x := range tv {
            // IMPORTANT: smallest score!
            if v.Score < x.Score {
                c := tv[i]
                tv[i] = x
                tv[a] = c
            }
        }
    }

    return tv[:l]
}

Divide Message

Get relative message groups

The next step is to divide our message into equal sized blocks. This time, however, append the key length and its relative index. Here is an example:

Let’s say we have a key-length of 2 and the message “this is a secret!

If we input that into our “grouping” function:

func GetMessageGroups(m string, gl int) []string {
    p := []string{}

    for i := 0; i < gl; i++ {
        g := ""
        c := i
        for c < len(m) {
            g += string(m[c])
            c += gl
        }

        p = append(p, g)
    }

    return p
}

Then we get back the following to entries:

  • TIIAERT
  • HSSSCE!

Crack / Bruteforce the key

Now we are done with all the preparations and can get down to the real work and try to find the right key.

We now take each of our text blocks from the previous example and go through all the bytes from 0x00 to 0xFF and perform an XOR. After each XOR we calculate a matching score for Natural Language (in this case English). And choose the result, that has the highest score. With a very high probability, we have found a correct sign from the original key.

Yes, I know that sounds very abstract. But just look at the code:

package core

// GroupAnalyse represents the score of an analysed group
type GroupAnalyse struct {
    ascii int
    score int
}

// CrackKey try to crack the key
func CrackKey(mg []string, rb []AnalyseStringResult, rl []AnalyseRuneResult) [][]GroupAnalyse {

    pkg := [][]GroupAnalyse{}

    for _, v := range mg {
        gpr := processGroup(v, rb, rl)
        pkg = append(pkg, gpr)
    }

    return pkg
}

func processGroup(g string, rb []AnalyseStringResult, rl []AnalyseRuneResult) []GroupAnalyse {
    gr := []GroupAnalyse{}
    for i := 0x00; i < 0xFF; i++ {
        xg := xorGroup(g, i)
        gr = append(gr, GroupAnalyse{i, calcMatchScore(rb, rl, xg)})
    }

    sortGroupAnalyse(&gr)
    return gr[:2]
}

func sortGroupAnalyse(t *[]GroupAnalyse) {
    for i, v := range *t {
        for a, x := range *t {
            if v.score > x.score {
                c := (*t)[i]
                (*t)[i] = x
                (*t)[a] = c
            }
        }
    }
}

func xorGroup(g string, k int) string {
    xm := []byte{}
    for _, c := range g {
        xm = append(xm, byte(c)^byte(k))
    }

    return string(xm)
}

func calcMatchScore(rb []AnalyseStringResult, rl []AnalyseRuneResult, xg string) int {
    s := 0
    for i := 0; i < len(xg); i++ {
        if (i + 1) < len(xg) {
            for _, b := range rb {
                if string(xg[i:(i+2)]) == b.value {
                    s++
                }
            }
        }

        for _, l := range rl {
            if rune(xg[i]) == l.value {
                s++
            }
        }
    }

    return s
}

Let’s test a real example

Let’s work with the following example text (it’s from the golang faq):

Robert Griesemer, Rob Pike and Ken Thompson started sketching the goals for a new language on the white board on September 21, 2007. Within a few days the goals had settled into a plan to do something and a fair idea of what it would be. Design continued part-time in parallel with unrelated work. By January 2008, Ken had started work on a compiler with which to explore ideas; it generated C code as its output. By mid-year the language had become a full-time project and had settled enough to attempt a production compiler. In May 2008, Ian Taylor independently started on a GCC front end for Go using the draft specification. Russ Cox joined in late 2008 and helped move the language and libraries from prototype to reality.

Now, I encrypt that text with the key “golang” and encode the encryption output in hexadecimal. Here you can quickly build your own test message.

35000e041c1347281e080b1402020913424735000e413e0e0c0a4c0000034724090f4e330f0001111d08094f1f150f15130a08411d0c021b0f090709004f18090b4700000d0d1d4701001e410f47090a1b410206090819000902470002411a0f024f1b090713024f0e0e0f15034f030f4e34021f18040305021d4c535f4b475d5c51594947380515060e094f0d410802104f08001714471b04044e00080e00124e0f060b4c120b13130309054e0e091b03410f4717030d0f4e13084f080e4e1408020915060e09084c000003470e4c070f0e154f05050b0647000a41190f061b4c081a471000190d0a47050a42412a0214060b0f4e04080118080012020b4c110f15134218080302470602411e06150e000d0b0b47180515064712011e040206130a08411908150442412c1e47250d0f1b0615164c535e575f434c2a0b0947070d054e14130e1e150b0347180313054708014c004e0408021c080202154f1b081a0f471804080d0f471b03410b1f170303130b470e0b09001d5c470618410902090a1e001a02034f2f410d08030a4c001d470e1b1f410112131f1915404725164c0c07034a1609001c4713070941020609081900090247070d054e05020c030c0b47064f0a14020b4a1b050c0b47171d030b0b04134f0d0f0a470f0e08411d02131b00040a4702010314090f471b03410f13130a01111a47064f1c130103120c18080109470c030c1e0e0b0a1e4f4e2e094f21001747555f5c5942472e0e02413a061e0303134e0e090b09110b09030a0215021e471c18001c13020b4c0e0047064f2b222d47011d030f1a47020108410808154f2b0e4e12140602064e130f0a4c051c06011b4c121e0204060a080d061306030f4047351a1f124e2408174c0b010e090a0841070947030d150b47555f5c594e06090b4c090b0b170a08410308110a4c15060247030d0f0912060809410f09034f00080c15061d05041d47011d030c4e171500180e1a1e170a4c150147150a0d0d07131e41

I really recommend to play a little bit around with some parameters like the maximum key-length and the maximum hamming distance range in the “discoverMessageParts” function.

The whole project can be found on github.

I hope you had fun

Cheers

close

Leave a Reply

Your email address will not be published. Required fields are marked *