Implementing Historical Ciphers in Python - Part 1

2021-04-02 943 words 5 minutes

Contents

My New Year’s resolution for 2021 was to learn more about encryption, starting from the earliest documented forms of it in history. I bought myself Simon Singh’s The Code Book and started reading through it. About the same time, I was doing a lot of experimental stuff in Python and had the idea to merge the two.

Fast forward, and I started implementing each cipher in the book starting with the Caesar cipher.

Understanding Caesar the Cipher

This cipher is built on an extremely simple concept, each letter is substituted by another letter based on a pre-determined “shift” amount. For example, if our shift is 3 then ‘a’ becomes ’d’. The entire alphabet is shifted sequentially so the order of the letters never changes. So, abcdefghijklmnopqrstuvwxyz becomes defghijklmnopqrstuvwxyzabc. Simple, right?

Obviously this cipher is trivial to break, but the intention was a quick way to hide a message from someone looking over your shoulder. I’m not going to get into the weakness of the cipher but instead talk through how we can implement it in python.

Shifting the Alphabet!

Okay, so given some alphabet and given some input message I basically want to iterate through the whole message and substitute each character as I go along. I’ll append any character I find, but if it’s not a alphabetic character we don’t substitute it and just append it the way it is (punctuation, whitespace, etc). In my implementation, I’m passing in the message stored in a text file to a function that will handle both encryption and decryption depending on argument flags passed when the script is run.

for c in orig:
    cryptor(c,func,crypt,shift)

cryptor takes in the following arguments: a character, what function we want (encrypt or decrypt), a list we’ll use to store each character, and the shift amount.

def cryptor(ch,func,msg,shamt):
    alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    if ch.strip() and ch in alphabet:
        msg.append(alphabet[(alphabet.index(ch) + shamt) % 26])

Basically, I’m using modulo 26 (length of the English alphabet) on the sum of the index position of the character passed in (within the alphabet) and the shift amount. This allows us to properly get the index of the encoded character, which we then append to the empty list that was passed in. Once that’s done, we use join to bring the list into our encoded string variable and output it. Easy-peasy.

Full code here.

Vigenere - a better Caesar

The next cipher presented in the book was the Vigenere. I took some time to think about this one, and realized that it’s really in essence just an expanded version of Caesar. We substitute letters, but instead we build a “key” that we use to pick out letters from a Row/Column matrix. Here’s what the matrix looks like:

Implementing it in python:

BASE_ALPH = r'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
vigenere = []

for i in range(0,len(BASE_ALPH)):
    for j in range(0,len(BASE_ALPH)):
        vigenere[i][j] = BASE_ALPH[(BASE_ALPH.index(BASE_ALPH[j]) + i) % 26]

Does that last bit look familiar? It should, it’s exactly what we used to generate the Caesar cipher encoding! Reusing code feels good. If you do a print(vigenere) it will output a List of Lists that is in essence our 2-d Vigenere Encoding Array.

Okay, so now you’re thinking… “what am I supposed to do?” Right? Well here’s where that key comes in. This key would only (hopefully :sweat_smile:) be known by the sender and the intended recipient.

Expanding the Key

Let’s imagine that the chosen key is “DOG”. We then have to expand the key to match the length of the message we want to send. So if our message is “Meet at noon”, we need to get the length of the message and then repeat the key until we hit that length. “Meet at noon” has 10 characters, so we’d need our KEY to be “DOGDOGDOGD”. The code snippet below is how I accomplished this:

BASE_KEY = 'DOG'
PLAINTEXT = 'MEET AT NOON'
# Strip any punctuation or whitespace from the plaintext.
PLAINTEXT_STRIP = ''.join(c for c in PLAINTEXT if c.isalnum())
BASE_KEY_LEN = len(BASE_KEY)
KEY_LEN = len(PLAINTEXT_STRIP)
keyIndex = 0
expandedKey = ""
for i in range(0,KEY_LEN):
    for j in range(0,BASE_KEY_LEN):
        if keyIndex < KEY_LEN:
            expandedKey += str(BASE_KEY[j])
            keyIndex += 1

Using the Key to Encode the Message

We have our key, we have the message we want to send. Now we need to encode it. What does that look like?

MEET AT NOON
DOGD OG DOGD

Let’s go back to the matrix above. The key character determines what row we use, and the message character determines the column. So matrix[key][message] will return the value of the encoded character.

Row M and Column D gives us P, and we do this letter by letter to generate the encoded message. Keep going and eventually Meet at Noon becomes Pskw oz Qcuq.

What makes Vigenere better than Caesar?

If you notice above, our original message has two instances where we have the same character repeated. Meet and Noon. However, in the encrypted form we don’t get the same output encoding. This is because the cipher is “rotating” through the key for each encoded character. In Caesar, Noon might become Doot :trumpet: and with enough occurence, we might be able to infer the shifting behavior. Frequency analysis can be done on both of these ciphers, Vigenere just makes it a little harder and more time consuming.

Full code here.

What next?

So I keep reading through the chapters and while there are a LOT of ciphers discussed and explored, I wanted to work on something slightly more advanced - enter Enigma (the next post I’ll make!). What’s cool is that I get to re-use code again, which really gives some insight into the evolution of encryption!