Reconstructing a Randomly Tokenized string in Python

2021-05-01 330 words 2 minutes

Contents

I’ve been grinding out a ton of HackTheBox recently, and have started creating a small repository of tools that I’ve made or found to help with certain challenges or general situations. Right now, I’m going through the Intro to Blue Team track (70% done) and it’s a lot of forensic related stuff. In a recent challenge I encountered a malicious word doc that had some macro to call a powershell script. Once I found the base64 encoded script, I noticed it was clearly calling a certain domain (I won’t go into the challenge details) but the URL was tokenized and then randomly ordered. However, right above all the randomized tokens, we see a bunch of integer values and I know they have to be related. The task was then to match these index values to the corresponding string tokens… but there was a lot of them.

Lately when I do these challenges, I encounter something like this where my first instinct is to try and automate the process or write a script but writing the script ends up taking longer than it would have to just do it by hand. But in this case, I saw the value of making something that could take an input of indices and then match those indices to the values in another array. The script is super simple but I think it will be useful in the future.

Anyways here’s the source and the challenge I used it for Lure. Feel free to use it to attempt this challenge! You’ll obviously have to modify it though since this is just a template and not the actual solution.

# given indices, reconstruct a string that may be tokenized and out of order.
# written for HTB Lure
indices = ["0", "3", "4", "2", "5", "1"]
obf_str = ["f", "}", "g", "l", "a", "{"]
reconstructed = ''
# match index to obfuscated string, append to a new string
for i in indices:
    reconstructed += obf_str[int(i)]
print(reconstructed)