HackTheBox | emo
Challenge Description:
WearRansom ransomware just got loose in our company. The SOC has traced the initial access to a phishing attack, a Word document with macros. Take a look at the document and see if you can find anything else about the malware and perhaps a flag.
(Personal Note: This one seems super relevant today! We have a word document that was used in a phishing attack. Ransomware and phishing are rampant so I was excited to attempt this challenge.)
Danger, Will Robinson!
So… this challenge is dealing with a ransomware within a word doc and they give us a word doc to analyze :thinking:. Step 1 is to load up a completely fresh Windows image into VirtualBox that has no network adapter because there’s no way I’m running this on my main Kali install or my own Windows installation - or on my network. That out of the way, let’s go.
Initial Forensic Analysis
When we load up our VM, and extract this file Windows Defender immediate hits us with this:
CISA has an alert page that goes into detail about this malware, how it propogates, and how it functions. If we go to the bottom of that page it lists some attack techniques documented by MITRE. There are several techniques that are relevant to how I’ll approach this challenge:
- Emotet has sent Microsoft Word documents with embedded macros that will invoke scripts to download additional payloads.
- Emotet has obfuscated macros within malicious documents to hide the URLs hosting the malware, CMD.exe arguments, and PowerShell scripts.
- Emotet has used cmd.exe to run a PowerShell script.
- Emotet has used Powershell to retrieve the malicious payload and download additional resources like Mimikatz.
Researching Malicious Documents
I found a really awesome PDF on the SANS site that has a ton of useful information to analyze malicious documents. I learned a few other things while researching malware embedded into Office docs:
- Office documents can be Unzipped and have a ton of XML and metadata included.
- There are tools that will analyze an Office doc to see if there are any macros, that SANS document lists a python script called
Oledump
that can extract macro data.
Dumping Office Macros in Linux
I’m going to go through this next part in a Live boot VM of Kali, again I don’t want to mess with this file on any actual systems I use… After downloading the script, I ran it with emo.doc
to see what happens.
From the documentation, I know that a letter ‘M’ (upper and lower case mean different things) designates a macro stream. We can use the -s <#>
flag to specify a macro stream we want to dump; so, I’m going to go one by one and see what I can find (11, 12, 13). I decided to dump each one to a text file so I could look for any flag references or possible hashes that would lead to a flag.
Once I dumped all the macro streams, I opened up one and was greeted with a wall of obfuscated code but…no flag hints anywhere so it looks like we’re going to need to do some deobfuscation at some point. Upon inspection, we can clearly tell that there’s some VBA scripting (Dim <var> as <type>
is a pretty big giveaway) - E.g.:
Dim AJfXCG(5 + 6 + 1 + 5) As String
Set dVZiWDFGB = (eQyofECdH)
Dim wmHOBFDQ(5 + 8 + 1 + 6) As String
I’m familiar with the syntax because I wrote an Excel add-on last year.
More information Gathering
Before I go further down the rabbit hole of deobfuscation, I’m going to do some more simple analysis of this file. In particular, I’m interested in what type of metadata is stored within it. Earlier, I mentioned that office files can be unzipped. Doing so reveals a ton of XML files that essentially contain the styling for the document itself (themes, fonts, etc) - but nothing really useful for us to solve this problem.
File Metadata
In Windows, you can right click any file and view it’s properties. The “Details” tab shows us some of the file’s metadata. Before we can look at this file, we’re going to need to add an exclusion to Windows Defender so it doesn’t delete the extracted malware immediately. (Or you can just disable Defender completely since we’re working in a sandbox without internet access).
Once we do that, we can look at the file properties and see this:
- Subject: Handmade Argentina Handcrafted Frozen Towels yellow Frozen virtual Guatemala array Lesotho JBOD
- This actually made me laugh by the way.
Sidenote: you can also use a command like file
in linux to get some of the same data
Other than that, there’s nothing really useful so we’ll move on to our next step.
Deobfuscation
I know I need to deobfuscate this mess, and we know that the malware executes some code as soon as the user opens the file. Leaning on my background in programming, that means at some point in this mess of text there is going to be an entry point of which our payload is being called from. So we need to find hints of functions being called.
If we go back to those txt files we made from oledump
outputs, we can examine the files in chronological order from how they’re called (11,12,13). Starting with our 11.txt
we see the following:
Private Sub Document_open()
Get4ipjzmjfvp.X8twf_cydt6
End Sub
Function X4al3i4glox(Gav481k8nh28)
X4al3i4glox = Replace(Gav481k8nh28, "][(s)]w", Sxybmdt069cn)
End Function
I assumed that Document_open()
was a function that is called when the document is…opened… but I wanted to confirm it so I checked Microsoft’s Documentation. So this Get4ipjzmjfvp.X8twf_cydt6
is potentially our entry point to the payload. However, this is supposed to be an EASY challenge, so I doubt the author wanted us to get into code deobfuscation. So after sitting back for a while and trying to see the big picture, I decided to do the unthinkable… and run this malware by opening the document.
The Trojans would be pleased.
Okay so, my clean and network segregated VM is up - when we open the doc we see this:
Which, to anyone with even a little knowledge of Word can clearly see is just an Image pasted into the text area. But… let’s assume we’re George or Susan from Payroll, and we really want those Handmade Argentina Handcrafted Frozen Towels. We’re going to hit that Enable button… and again Microsoft is telling us, “hey… there’s stuff in here, are you sure you want to enable it?” Yes… I need those towels.
After ignoring all the warnings and enabling the content, nothing seemed to happen - but we know better, so let’s check our running processes to see if somethings running in the background. I want to pinpoint a timeframe from when I first opened the doc, and examine any logs or events that happened after that timepoint - I opened the doc at 1:38 P.M.. So what I’m going to do is open up the Windows Event Manager and see what happened around that time. We see some VERY interesting stuff here.
System Events
A new logon is created - I’m not sure if this is the Malware or just a coincidence:
There’s also an event around the same time to read local credentials, again maybe coincidence?
Security Events
This one is interesting, it’s attempting to access my system settings by using an account with the sid S-1-5-21-495180511-3817591229-102401400-500
, however there is only one account in this VM and it’s a user with sid S-1-5-21-495180511-3817591229-102401400-1001
. I know a little about Windows sids, and 500 is typically the sid for the default Administrator account on a system. However, that account isn’t enabled on my VM. What this tells me, is the malware is attempting to change system settings by brute forcing the credentials on the machine and looking for accounts with elevated privilege.
Application Settings
Okay, this is the good stuff. In here, we can see an event that has some interesting characteristics, namely this 0ff1ce15-a989-479d-af46-f275c6370663
which looks like some leet-speak for “Office”. This has me interested in further investigating. However, upon searching it’s actually just related to Office failing to activate properly… man I really thought I was onto something there.
Wireshark Time
One thing we know about Emotet is that it needs to connect to a C2 server to download its payload. So… I’m going to enable a network adapter on this VM, start WireShark and see what happens. I close out the word doc, kill every running process. Then I load up wireshark, start a new capture and then open the malicious doc - wait a few seconds then stop the capture (oh and then I disable the network adapter again! Now we just need to go through this pcap and see if we can find any packets that look nasty or are trying to reach out to some server. I managed to locate some packets perfoming DNS queries on several .htb domains, and this is how I know I’m on the right track.
da-industrial.htb
daprofesional.htb
www.outspokenvisions.htb
dagranitegiare.htb
mobsouk.htb
biglaughs.htb
ngllogistics.htb
The Payload
Now I know that the malware attempted to reach out to these hosts, likely to download the payload. Which means it’s time to go back to Event Viewer to see if we find anything useful. Sure enough we see some powershell command running a super long encoded string.
I decoded the string using base64 (HTB loves it) and got this as a result:
Now, at first glance this looks useless HOWEVER… if you look closely, you can see that there are some ASCII characters in there mixed with the garbage. So… ENTER REGEX!
Taking out the trash
There’s a super usefull RegEx pattern we can use to match against non-ascii characters, [^\x00-\x7F]+
. Using this, I can replace any non-ascii character with a null value and see what happens to that block of garbage. I’m going to do this in Notepad++:
After running RegEx matching:
There’s something that immediately sticks out to me, and that’s all the numbers in this format:
$FN5ggmsH += (186,141,228,182,177,171,229,236,239,239,239,228,181,182,171,229,234,239,239,228)
$FN5ggmsH += (185,179,190,184,229,151,139,157,164,235,177,239,171,183,236,141,128,187,235,134,128,158,177,176,139)
$FN5ggmsH += (183,154,173,128,175,151,238,140,183,162,228,170,173,179,229)
I think I’m finally onto something here. This isn’t base64, and they look like just base10 integers. But we do have another hint on how to decode this even further. Within that decoded powershell script, we can see things like [ChAR]53
which leads me to believe these integers represent characters - and that means we potentially have our flag. I know it’s not ASCII encoding because the decimal values are too high. However, Unicode - Latin goes all the way to around 600. So I’m going to try converting them From Unicode to to the ASCII text equivalent.
Using Logic
We can use a tool like CyberChef from GCHQ to throw some different functions at this array of integers. Like I said, I wanted to revert these potential Unicode encoded characters back to their original text state. This can be accomplished by using the From Charcode
recipe - of which we’ll set the Base to 10 and the delimiter to a comma.
XOR…again?
We get an output, but it looks like it’s further encoded somehow but I’m 99% sure this is our flag. Thinking back to my xorxorxor writeup, I remember that we know for sure that the flag WILL contain HTB{
in that specific order. I’m thinking to try some XORs because we know the first input and we know the output, we’re just needing the second input in order to figure out a possible key (in the event it IS XOR…again this is just a hunch).
Solved
I started writing a python script to split the list potential encoded characters into chunks of 4 (to test against HTB{ if there’s a repeating key), but it started to become really tedious and I got lost in the logic of what I was trying to do (iterating through 600 Unicode characters to brute force the key seemed unreasonable).
# ints we extracted from the obfuscated powershell
psh = [186, 141, 228, 182, 177, 171, 229, 236, 239, 239, 239, 228, 181, 182, 171, 229, 234, 239, 239, 228, 185, 179, 190, 184, 229, 151, 139, 157, 164,
235, 177, 239, 171, 183, 236, 141, 128, 187, 235, 134, 128, 158, 177, 176, 139, 183, 154, 173, 128, 175, 151, 238, 140, 183, 162, 228, 170, 173, 179, 229]
# I want to break the array into equal parts of chunks of 4 integers.
# We know we want a 4 character Key, so we should set N rows of len(psh)/4
psh_split = []
arr_len = len(psh)
keylen = 4
n = int(arr_len/keylen)
# Our Base list 15 rows of 4 elements
for r in range(0, n):
psh_split.append([0 for c in range(0, keylen)])
# now populate the 2d array with the values from psh
x = 0
for i in range(0, n):
for j in range(0, keylen):
psh_split[i][j] = psh[x]
x += 1
I wanted to stick with the XOR idea, so I started looking for 4 character keys that might be hinted towards in the challenge. I was beating my head against the wall for a while on this one and then decided to do a little more deobfuscation in the original PowerShell. Looking back to the deobfuscated powershell, I finally found this incredibly important snippet in the payload decryption loop += ([byte][char]${_} -bxor 0xdf )
which signals to us what we need to know! Thanks to 0xdf for the great challenge :wink:. I’d like to deep dive the obfuscated VBA at some point, maybe play with PSDecode. For now, I feel good getting to this point.
Lesson Learned
I didn’t need to go through so much effort for this one, running it in a VM and seeing what logs are generated in the Event Viewer is really more than enough.