HackTheBox | exatlon
Environment
When doing any type of reverse engineering, do it in a fresh image that is segmented from any other machine and does not have networking enabled. Even though these binaries come from trusted sources, and exist for educational purposes we should still approach these programs as if they are malicious. My lab environment is a kali box behind a pfSense firewall running in VMs.
File Information
running the program
Here’s what happens when we run this binary:
kali@kali:~$ ./exatlon_v1
███████╗██╗ ██╗ █████╗ ████████╗██╗ ██████╗ ███╗ ██╗ ██╗ ██╗ ██╗
██╔════╝╚██╗██╔╝██╔══██╗╚══██╔══╝██║ ██╔═══██╗████╗ ██║ ██║ ██║███║
█████╗ ╚███╔╝ ███████║ ██║ ██║ ██║ ██║██╔██╗ ██║ ██║ ██║╚██║
██╔══╝ ██╔██╗ ██╔══██║ ██║ ██║ ██║ ██║██║╚██╗██║ ╚██╗ ██╔╝ ██║
███████╗██╔╝ ██╗██║ ██║ ██║ ███████╗╚██████╔╝██║ ╚████║███████╗╚████╔╝ ██║
╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═══╝╚══════╝ ╚═══╝ ╚═╝
[+] Enter Exatlon Password : adad
[-] ;(
It takes in our input and outputs ;(
if we don’t put in the right password. Pretty standard RE binary. Let’s dig into this program to see if we can get some useful information to point us in the right direction.
file
Using file
, we can get a generic overview of the file type:
kali@kali:~$ file exatlon_v1
exatlon_v1: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, no section header
strings
Another low-hanging fruit we can look for is to see if there are any flag references stored as plaintext within this binary. And YES, we CAN see variables, strings, function calls, etc as plaintext even if this program is compiled. There are a lot of searches we can do by combining grep
and strings
.
We could do a search to figure out how this program was compiled, and what system it was compiled on (ag
is essentially just a faster grep
and -i
just means case-insensitive):
kali@kali:~$ strings ./exatlon_v1 | ag -i gcc
GCC: (Debian 9.2.1-21)
We can also check for libraries that may indicate what programming language was used to write the program:
kali@kali:~$ strings ./exatlon_v1 | ag -i libc
glibc-ldI3
You get the picture right? This particular file has a LOT of lines of strings to go through, so I’d only do this to try for low-hanging fruit and specific targeted info…
kali@kali:~$ strings ./exatlon_v1 | wc -l
10943 <------ 10k lines of strings, YIKES!
Disassembly in Binary Ninja
There are a lot of ways we can debug this program. We could run it through GDB, Analyze it in IDA PRO, Dive into rabbit holes with Ghidra, etc. For this one I’m going to use Binary Ninja. Once we load the program into Binja, a look at the strings view and I found something really interesting off the bat:
It seems like this program was packed using a …packer… specifically something called UPX, and this packer apparently likes to brag and let people know it was used which is great for us. So, let’s research UPX.
UPX
Running man upx
:
(you may need to apt install upx
if it’s not installed on your system)
UPX is a portable, extendable, high-performance executable packer for several different executable formats. It achieves an excellent compression
...
All UPX supported file formats can be unpacked using the -d switch, eg. upx -d yourfile.exe will uncompress the file you've just compressed.
decompressing the packed ELF
kali@kali:~$ upx -d ./exatlon_v1
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2020
UPX 3.96 Markus Oberhumer, Laszlo Molnar & John Reiser Jan 23rd 2020
File size Ratio Format Name
-------------------- ------ ----------- -----------
2202568 <- 709524 32.21% linux/amd64 exatlon_v1
Unpacked 1 file.
Running that command resulted in the original binary unpacking itself to the same file. You can see from the result of the command, it was compressed at a ~32% ratio. If we run file
again, we’ll see some new info.
UPX ELF:
kali@kali:~$ file exatlon_v1
exatlon_v1: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, no section header
Unpacked ELF:
kali@kali:~$ file exatlon_v1
exatlon_v1: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=99364060f1420e00a780745abcfa419af0b8b0d8, for GNU/Lin3.2.0, not stripped
Let’s re-open this file in Binja again.
Understanding the Program Logic
Now that we’ve unpacked this binary, we’re able to see a lot more details during the debugging and disassembly process. First, let’s look at the main()
loop of our program:
main()
Here, we can clearly see the point in this program where we’re being prompted for our input. Everything prior to that is essentially useless code that is printing out that EXATLONV1 ASCII banner. However, if we dig deeper we’re actually able to locate the point at which the program will branch depending on our input. In this visual representation, I’ve highlighted some key areas.
branching behavior
test bl,bl
je 0x404d83
Without going too deep into assembly, all these two lines are doing is testing for a condition and if that condition is true, we jump to the next instruction located at memory address 0x404d83. In the above figure, je 0x404d83
will bring us to that left path which results in the condition we want to avoid: [-] ;(
those numbers are interesting…
In yellow, I’ve highlighted some numbers:
1152 1344 1056 1968 1728 816 1648 784 1584 816 1728 1520 1840 1664 784 1632 1856 1520 1728 816 1632 1856 1520 784 1760 1840 1824 816 1584 1856 784 1776 1760 528 528 2000
We can actually see (from disassembly) that these numbers have been hard coded into the program which means they have extreme significance to us (i.e. this is likely our flag in some form):
Debugging the Binary
what’s a flag anyways
There is one thing we know about ALL HackTheBox challenges… our flag will always be in the form HTB{foo}
, right? So it’s likely that we will be able to match a little of what we know to what’s been given to us in the program. To do this, I’m going to be running this program through GDB
(an open source debugger). While the program is running, I’m going to be watching the behavior of the registers.
Watching Registers with a Debugger
In order to debug this program, we need to set breakpoints that allow us to halt the program when it reaches a particular point. In this particular case, I’m very interested to know what happens to our input to determine that branching behavior. To do this, we need to pinpoint memory addresses that hold key functions. Since we know this is a C
program, we’d likely want to find where in memory a cin
or operator>>
is. Remember, program compilation is not meant to be a reversible process so whatever we find is not going to be an exact replication of the code.
disassemble main
We know our program is looping over that main()
function, so we can actually use GDB
(specifically pwndbg
which is gdb
on steroids) to disassemble that particular function by name and dump all the memory addresses. Doing so results in an output like this:
pwndbg> disass main
Dump of assembler code for function main:
0x0000000000404c2c <+0>: push rbp
0x0000000000404c2d <+1>: mov rbp,rsp
0x0000000000404c30 <+4>: push r12
0x0000000000404c32 <+6>: push rbx
0x0000000000404c33 <+7>: sub rsp,0x40
0x0000000000404c37 <+11>: lea rsi,[rip+0x1463d1] # 0x54b00f
0x0000000000404c3e <+18>: lea rdi,[rip+0x1a693b] # 0x5ab580 <_ZSt4cout>
0x0000000000404c45 <+25>: call 0x468450 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc>
0x0000000000404c4a <+30>: lea rsi,[rip+0x1463c7] # 0x54b018
0x0000000000404c51 <+37>: lea rdi,[rip+0x1a6928] # 0x5ab580 <_ZSt4cout>
0x0000000000404c58 <+44>: call 0x468450 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc>
...
Depending on how much code is in main()
, this output could be long or short but the important thing is that we look through the output to find what’s important. There’s a LOT of junk here, but scrolling through it we come across this:
0x0000000000404d0a <+222>: lea rdi,[rip+0x1a698f] # 0x5ab6a0 <_ZSt3cin>
0x0000000000404d11 <+229>: call 0x406d90 <_ZStrsIcSt11char_traitsIcESaIcEERSt13basic_istreamIT_T0_ES7_RNSt7__cxx1112basic_stringIS4_S5_T1_EE>
0x0000000000404d16 <+234>: lea rax,[rbp-0x30]
0x0000000000404d1a <+238>: lea rdx,[rbp-0x50]
0x0000000000404d1e <+242>: mov rsi,rdx
0x0000000000404d21 <+245>: mov rdi,rax
0x0000000000404d24 <+248>: call 0x404aad <_Z7exatlonRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE>
0x0000000000404d29 <+253>: lea rax,[rbp-0x30]
0x0000000000404d2d <+257>: lea rsi,[rip+0x1467bc] # 0x54b4f0
0x0000000000404d34 <+264>: mov rdi,rax
0x0000000000404d37 <+267>: call 0x4050fa <_ZSteqIcSt11char_traitsIcESaIcEEbRKNSt7__cxx1112basic_stringIT_T0_T1_EEPKS5_>
0x0000000000404d3c <+272>: mov ebx,eax
0x0000000000404d3e <+274>: lea rax,[rbp-0x30]
0x0000000000404d42 <+278>: mov rdi,rax
0x0000000000404d45 <+281>: call 0x46d330 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED2Ev>
0x0000000000404d4a <+286>: test bl,bl
0x0000000000404d4c <+288>: je 0x404d83 <main+343>
If you look at the above code, you can quickly recognize the last two lines as the conditional check before the program branches (refer to the screenshots above if you forgot). But also, we can see that the very first line is where we’re calling <_ZSt3cin>
aka our input prompt. With this information, we know we should set a break point for that memory address so we can observe the register behavior prior to and after inputting some text. We also should set a break point right at that test bl, bl
instruction to see what is being “tested” (my hunch is this is some kind of strcmp - string comparison). To do this in GDB
all we need to do is:
b *0x0000000000404d0a
- will halt the program right before we input our textb *0x0000000000404d4a
- will halt the program right before the string comparison.
pwndbg> b *0x0000000000404d0a
Breakpoint 1 at 0x404d0a
pwndbg> b *0x0000000000404d4a
Breakpoint 2 at 0x404d4a
breakpoints
With our breakpoints set, and with the knowledge that we’re after a flag that will always start with HTB{
, let’s do some experimentation. I’m going to use this knowledge and simply input an H
into the program to see how that H
is being represented in the registers. I’m going to leave a HUGE caveat here, this part might seem super dense/technical but really you don’t need to have advanced knowledge of assembly for this… just try and think about what’s going on in terms of the program’s logic.
We reach our first breakpoint and are prompted:
pwndbg>
[+] Enter Exatlon Password :
The current values in our registers are as follows (again, don’t worry about understanding exactly what this stuff means):
RAX 0x7fffffffdf20 —▸ 0x7fffffffdf30 —▸ 0x59e900 —▸ 0x5a8540 (_nl_global_locale) —▸ 0x59fb00 (_nl_C_LC_CTYPE) ◂— ...
RBX 0x400548 ◂— 0x0
RCX 0x6f7773736150206e ('n Passwo')
RDX 0x5a5e40 —▸ 0x466f50 ◂— mov rax, 0x5a5e28
*RDI 0x5ab6a0 (std::cin) —▸ 0x5a53a0 —▸ 0x44cd40 ◂— mov rax, 0x5a5388
RSI 0x7fffffffdf20 —▸ 0x7fffffffdf30 —▸ 0x59e900 —▸ 0x5a8540 (_nl_global_locale) —▸ 0x59fb00 (_nl_C_LC_CTYPE) ◂— ...
R8 0x5af800 ◂— 0x0
R9 0x4db420 (__memcpy_ssse3+9600) ◂— mov r10, qword ptr [rsi - 0x1e]
R10 0x65746e45205d2b5b ('[+] Ente')
R11 0x6f6c746178452072 ('r Exatlo')
R12 0x49ebe0 (__libc_csu_fini) ◂— push rbp
R13 0x0
R14 0x5a8018 (_GLOBAL_OFFSET_TABLE_+24) —▸ 0x4d6a40 (__rawmemchr_sse2) ◂— movq xmm1, rsi
R15 0x0
Remembering that our goal is HTB{foo}
, I’m going to input a single H
and see what happens to the registers. After typing in an H
as our input, we see some very interesting behavior:
[+] Enter Exatlon Password : H
Breakpoint 1, 0x0000000000404d4a in main ()
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
RAX 0x7fffffffdf40 —▸ 0x7fffffffdf50 ◂— 0x2032353131 /* '1152 ' */
RBX 0xffffff00
RCX 0x32353131
RDX 0x20
RDI 0x7fffffffdf50 ◂— 0x2032353131 /* '1152 ' */
RSI 0x54b4f5 ◂— '1344 1056 1968 1728 816 1648 784 1584 816 1728 1520 1840 1664 784 1632 1856 1520 1728 816 1632 1856 1520 784 1760 1840 1824 816 1584 1856 784 1776 1760 528 528 2000 '
R8 0x7fffffffdf50 ◂— 0x2032353131 /* '1152 ' */
The value 1152 was loaded into one of the registers. Why is that interesting? Well… remember that list of numbers we found during the course of the disassembly process? The first value in that list of numbers was, you guessed it, 1152.
testing a theory
If HTB{foo}
is our goal, and inputting an H
resulted in the value 1152
being thrown into a register just before that branch… then if we input a T
and get 1344
to load in a register we can almost certainly assume those numbers represent individual characters for our flag. So let’s test it:
[+] Enter Exatlon Password : T
Breakpoint 1, 0x0000000000404d4a in main ()
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
RAX 0x7fffffffdf40 —▸ 0x7fffffffdf50 ◂— 0x2034343331 /* '1344 ' */
RBX 0x200
RCX 0x3131
RDX 0x5
RDI 0x7fffffffdf50 ◂— 0x2034343331 /* '1344 ' */
RSI 0x54b4f5 ◂— '1344 1056 1968 1728 816 1648 784 1584 816 1728 1520 1840 1664 784 1632 1856 1520 1728 816 1632 1856 1520 784 1760 1840 1824 816 1584 1856 784 1776 1760 528 528 2000 '
R8 0x7fffffffdf50 ◂— 0x2034343331 /* '1344 ' */
Okay, maybe we just got lucky… so let’s test that third character, B
, to see if 1056
gets loaded… if it does then we know this is our flag.
[+] Enter Exatlon Password : B
Breakpoint 1, 0x0000000000404d4a in main ()
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
RAX 0x7fffffffdf40 —▸ 0x7fffffffdf50 ◂— 0x2036353031 /* '1056 ' */
BINGO.
Decoding the Flag
So at this point, it’s pretty safe to assume that “1152 1344 1056 1968 1728 816 1648 784 1584 816 1728 1520 1840 1664 784 1632 1856 1520 1728 816 1632 1856 1520 784 1760 1840 1824 816 1584 1856 784 1776 1760 528 528 2000” is our flag. However, these numbers don’t really mean much. The ASCII table only has decimal values up to 125 for characters that would be in our flag… so we need to figure out what relationship these numbers have.
Fun with ASCII
From our handy-dandy ASCII Table, we know that H
can be represented as the decimal value 72
OR the hexadecimal value 48
. My first instinct is to try dividing that big number by these values to see what results:
- 1152/72 = 16
- 1152/48 = 24
Whole numbers are promising, but I’m not entirely convinced yet… let’s do each of the first four characters to try and see if there’s a pattern:
T (Decimal 84; Hex 54)
- 1344/84 = 16
- 1344/54 = 24.8888888889
B (Decimal 66; Hex 42)
- 1056/66 = 16
- 1056/42 = 25.1428571429
{ (Decimal 123; Hex 7B)
- 1968/123 = 16
- 1968/7B? Yeah… I think that rules this one out, we’re not going to be dividing two different bases.
Multiples of 16
Going by the pattern we’ve established, let’s write a simple Python script that will divide all these values by 16, then convert the resulting decimals to ASCII characters with the chr()
function.
Something like this:
And there we have it.
Analyzing our Process
So let’s break down what we did here to try and define the methodology a little more concretely:
- Run the program to get a general idea of its purpose
- Gather information about the file (metadata, compilers, libraries, etc)
- Try and figure out the program’s logical behavior (conditional branching, loops, etc.) - You can do this visually in a GUI environment like IDA Pro, Ghidra, or Binary Ninja… or via commandline with something like GDB.
- Make a note of “important” instruction memory addresses and memory addresses of important variables
- Set breakpoints at key functions, and monitor register behavior.
- Tie what you see happening in the registers to what you understand about the program’s logical behavior
- in this case, we were able to figure out that the list of numbers corresponded to individual characters.