Contents

HackTheBox | exatlon

Environment

When doing any type of reverse engineering, do it in a fresh image that is segmented from any other machine and does not have networking enabled. Even though these binaries come from trusted sources, and exist for educational purposes we should still approach these programs as if they are malicious. My lab environment is a kali box behind a pfSense firewall running in VMs.

File Information

running the program

Here’s what happens when we run this binary:

kali@kali:~$ ./exatlon_v1

███████╗██╗  ██╗ █████╗ ████████╗██╗      ██████╗ ███╗   ██╗       ██╗   ██╗ ██╗
██╔════╝╚██╗██╔╝██╔══██╗╚══██╔══╝██║     ██╔═══██╗████╗  ██║       ██║   ██║███║
█████╗   ╚███╔╝ ███████║   ██║   ██║     ██║   ██║██╔██╗ ██║       ██║   ██║╚██║
██╔══╝   ██╔██╗ ██╔══██║   ██║   ██║     ██║   ██║██║╚██╗██║       ╚██╗ ██╔╝ ██║
███████╗██╔╝ ██╗██║  ██║   ██║   ███████╗╚██████╔╝██║ ╚████║███████╗╚████╔╝  ██║
╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝   ╚═╝   ╚══════╝ ╚═════╝ ╚═╝  ╚═══╝╚══════╝ ╚═══╝   ╚═╝


[+] Enter Exatlon Password  : adad
[-] ;(

It takes in our input and outputs ;( if we don’t put in the right password. Pretty standard RE binary. Let’s dig into this program to see if we can get some useful information to point us in the right direction.

file

Using file, we can get a generic overview of the file type:

kali@kali:~$ file exatlon_v1
exatlon_v1: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, no section header

strings

Another low-hanging fruit we can look for is to see if there are any flag references stored as plaintext within this binary. And YES, we CAN see variables, strings, function calls, etc as plaintext even if this program is compiled. There are a lot of searches we can do by combining grep and strings.

We could do a search to figure out how this program was compiled, and what system it was compiled on (ag is essentially just a faster grep and -i just means case-insensitive):

kali@kali:~$ strings ./exatlon_v1 | ag -i gcc
GCC: (Debian 9.2.1-21)

We can also check for libraries that may indicate what programming language was used to write the program:

kali@kali:~$ strings ./exatlon_v1 | ag -i libc
glibc-ldI3

You get the picture right? This particular file has a LOT of lines of strings to go through, so I’d only do this to try for low-hanging fruit and specific targeted info…

kali@kali:~$ strings ./exatlon_v1 | wc -l
10943 <------ 10k lines of strings, YIKES!

Disassembly in Binary Ninja

There are a lot of ways we can debug this program. We could run it through GDB, Analyze it in IDA PRO, Dive into rabbit holes with Ghidra, etc. For this one I’m going to use Binary Ninja. Once we load the program into Binja, a look at the strings view and I found something really interesting off the bat:

/images/htb/challenges/reversing/exatlon/upx.png

It seems like this program was packed using a …packer… specifically something called UPX, and this packer apparently likes to brag and let people know it was used which is great for us. So, let’s research UPX.

UPX

Running man upx: (you may need to apt install upx if it’s not installed on your system)

UPX is a portable, extendable, high-performance executable packer for several different executable formats. It achieves an excellent compression
...
All UPX supported file formats can be unpacked using the -d switch, eg.  upx -d yourfile.exe will uncompress the file you've just compressed.

decompressing the packed ELF

kali@kali:~$ upx -d ./exatlon_v1
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2020
UPX 3.96        Markus Oberhumer, Laszlo Molnar & John Reiser   Jan 23rd 2020

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
   2202568 <-    709524   32.21%   linux/amd64   exatlon_v1

Unpacked 1 file.

Running that command resulted in the original binary unpacking itself to the same file. You can see from the result of the command, it was compressed at a ~32% ratio. If we run file again, we’ll see some new info.

UPX ELF:

kali@kali:~$ file exatlon_v1
exatlon_v1: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, no section header

Unpacked ELF:

kali@kali:~$ file exatlon_v1
exatlon_v1: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=99364060f1420e00a780745abcfa419af0b8b0d8, for GNU/Lin3.2.0, not stripped

Let’s re-open this file in Binja again.

Understanding the Program Logic

Now that we’ve unpacked this binary, we’re able to see a lot more details during the debugging and disassembly process. First, let’s look at the main() loop of our program:

main()

/images/htb/challenges/reversing/exatlon/main.png

Here, we can clearly see the point in this program where we’re being prompted for our input. Everything prior to that is essentially useless code that is printing out that EXATLONV1 ASCII banner. However, if we dig deeper we’re actually able to locate the point at which the program will branch depending on our input. In this visual representation, I’ve highlighted some key areas.

branching behavior

/images/htb/challenges/reversing/exatlon/branches.png

test  bl,bl
je    0x404d83

Without going too deep into assembly, all these two lines are doing is testing for a condition and if that condition is true, we jump to the next instruction located at memory address 0x404d83. In the above figure, je 0x404d83 will bring us to that left path which results in the condition we want to avoid: [-] ;(

those numbers are interesting…

In yellow, I’ve highlighted some numbers:

  • 1152 1344 1056 1968 1728 816 1648 784 1584 816 1728 1520 1840 1664 784 1632 1856 1520 1728 816 1632 1856 1520 784 1760 1840 1824 816 1584 1856 784 1776 1760 528 528 2000

We can actually see (from disassembly) that these numbers have been hard coded into the program which means they have extreme significance to us (i.e. this is likely our flag in some form):

/images/htb/challenges/reversing/exatlon/flag.png

Debugging the Binary

what’s a flag anyways

There is one thing we know about ALL HackTheBox challenges… our flag will always be in the form HTB{foo}, right? So it’s likely that we will be able to match a little of what we know to what’s been given to us in the program. To do this, I’m going to be running this program through GDB (an open source debugger). While the program is running, I’m going to be watching the behavior of the registers.

Watching Registers with a Debugger

In order to debug this program, we need to set breakpoints that allow us to halt the program when it reaches a particular point. In this particular case, I’m very interested to know what happens to our input to determine that branching behavior. To do this, we need to pinpoint memory addresses that hold key functions. Since we know this is a C program, we’d likely want to find where in memory a cin or operator>> is. Remember, program compilation is not meant to be a reversible process so whatever we find is not going to be an exact replication of the code.

disassemble main

We know our program is looping over that main() function, so we can actually use GDB (specifically pwndbg which is gdb on steroids) to disassemble that particular function by name and dump all the memory addresses. Doing so results in an output like this:

pwndbg> disass main
Dump of assembler code for function main:
   0x0000000000404c2c <+0>:     push   rbp
   0x0000000000404c2d <+1>:     mov    rbp,rsp
   0x0000000000404c30 <+4>:     push   r12
   0x0000000000404c32 <+6>:     push   rbx
   0x0000000000404c33 <+7>:     sub    rsp,0x40
   0x0000000000404c37 <+11>:    lea    rsi,[rip+0x1463d1]        # 0x54b00f
   0x0000000000404c3e <+18>:    lea    rdi,[rip+0x1a693b]        # 0x5ab580 <_ZSt4cout>
   0x0000000000404c45 <+25>:    call   0x468450 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc>
   0x0000000000404c4a <+30>:    lea    rsi,[rip+0x1463c7]        # 0x54b018
   0x0000000000404c51 <+37>:    lea    rdi,[rip+0x1a6928]        # 0x5ab580 <_ZSt4cout>
   0x0000000000404c58 <+44>:    call   0x468450 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc>
...

Depending on how much code is in main(), this output could be long or short but the important thing is that we look through the output to find what’s important. There’s a LOT of junk here, but scrolling through it we come across this:

   0x0000000000404d0a <+222>:   lea    rdi,[rip+0x1a698f]        # 0x5ab6a0 <_ZSt3cin>
   0x0000000000404d11 <+229>:   call   0x406d90 <_ZStrsIcSt11char_traitsIcESaIcEERSt13basic_istreamIT_T0_ES7_RNSt7__cxx1112basic_stringIS4_S5_T1_EE>
   0x0000000000404d16 <+234>:   lea    rax,[rbp-0x30]
   0x0000000000404d1a <+238>:   lea    rdx,[rbp-0x50]
   0x0000000000404d1e <+242>:   mov    rsi,rdx
   0x0000000000404d21 <+245>:   mov    rdi,rax
   0x0000000000404d24 <+248>:   call   0x404aad <_Z7exatlonRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE>
   0x0000000000404d29 <+253>:   lea    rax,[rbp-0x30]
   0x0000000000404d2d <+257>:   lea    rsi,[rip+0x1467bc]        # 0x54b4f0
   0x0000000000404d34 <+264>:   mov    rdi,rax
   0x0000000000404d37 <+267>:   call   0x4050fa <_ZSteqIcSt11char_traitsIcESaIcEEbRKNSt7__cxx1112basic_stringIT_T0_T1_EEPKS5_>
   0x0000000000404d3c <+272>:   mov    ebx,eax
   0x0000000000404d3e <+274>:   lea    rax,[rbp-0x30]
   0x0000000000404d42 <+278>:   mov    rdi,rax
   0x0000000000404d45 <+281>:   call   0x46d330 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED2Ev>
   0x0000000000404d4a <+286>:   test   bl,bl
   0x0000000000404d4c <+288>:   je     0x404d83 <main+343>

If you look at the above code, you can quickly recognize the last two lines as the conditional check before the program branches (refer to the screenshots above if you forgot). But also, we can see that the very first line is where we’re calling <_ZSt3cin> aka our input prompt. With this information, we know we should set a break point for that memory address so we can observe the register behavior prior to and after inputting some text. We also should set a break point right at that test bl, bl instruction to see what is being “tested” (my hunch is this is some kind of strcmp - string comparison). To do this in GDB all we need to do is:

  • b *0x0000000000404d0a - will halt the program right before we input our text
  • b *0x0000000000404d4a - will halt the program right before the string comparison.
pwndbg> b *0x0000000000404d0a
Breakpoint 1 at 0x404d0a
pwndbg> b *0x0000000000404d4a
Breakpoint 2 at 0x404d4a

breakpoints

With our breakpoints set, and with the knowledge that we’re after a flag that will always start with HTB{, let’s do some experimentation. I’m going to use this knowledge and simply input an H into the program to see how that H is being represented in the registers. I’m going to leave a HUGE caveat here, this part might seem super dense/technical but really you don’t need to have advanced knowledge of assembly for this… just try and think about what’s going on in terms of the program’s logic.

We reach our first breakpoint and are prompted:

pwndbg>
[+] Enter Exatlon Password  :

The current values in our registers are as follows (again, don’t worry about understanding exactly what this stuff means):

 RAX  0x7fffffffdf20 —▸ 0x7fffffffdf30 —▸ 0x59e900 —▸ 0x5a8540 (_nl_global_locale) —▸ 0x59fb00 (_nl_C_LC_CTYPE) ◂— ...
 RBX  0x400548 ◂— 0x0
 RCX  0x6f7773736150206e ('n Passwo')
 RDX  0x5a5e40 —▸ 0x466f50 ◂— mov    rax, 0x5a5e28
*RDI  0x5ab6a0 (std::cin) —▸ 0x5a53a0 —▸ 0x44cd40 ◂— mov    rax, 0x5a5388
 RSI  0x7fffffffdf20 —▸ 0x7fffffffdf30 —▸ 0x59e900 —▸ 0x5a8540 (_nl_global_locale) —▸ 0x59fb00 (_nl_C_LC_CTYPE) ◂— ...
 R8   0x5af800 ◂— 0x0
 R9   0x4db420 (__memcpy_ssse3+9600) ◂— mov    r10, qword ptr [rsi - 0x1e]
 R10  0x65746e45205d2b5b ('[+] Ente')
 R11  0x6f6c746178452072 ('r Exatlo')
 R12  0x49ebe0 (__libc_csu_fini) ◂— push   rbp
 R13  0x0
 R14  0x5a8018 (_GLOBAL_OFFSET_TABLE_+24) —▸ 0x4d6a40 (__rawmemchr_sse2) ◂— movq   xmm1, rsi
 R15  0x0

Remembering that our goal is HTB{foo}, I’m going to input a single H and see what happens to the registers. After typing in an H as our input, we see some very interesting behavior:

[+] Enter Exatlon Password  : H

Breakpoint 1, 0x0000000000404d4a in main ()
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 RAX  0x7fffffffdf40 —▸ 0x7fffffffdf50 ◂— 0x2032353131 /* '1152 ' */
 RBX  0xffffff00
 RCX  0x32353131
 RDX  0x20
 RDI  0x7fffffffdf50 ◂— 0x2032353131 /* '1152 ' */
 RSI  0x54b4f5 ◂— '1344 1056 1968 1728 816 1648 784 1584 816 1728 1520 1840 1664 784 1632 1856 1520 1728 816 1632 1856 1520 784 1760 1840 1824 816 1584 1856 784 1776 1760 528 528 2000 '
 R8   0x7fffffffdf50 ◂— 0x2032353131 /* '1152 ' */

The value 1152 was loaded into one of the registers. Why is that interesting? Well… remember that list of numbers we found during the course of the disassembly process? The first value in that list of numbers was, you guessed it, 1152.

testing a theory

If HTB{foo} is our goal, and inputting an H resulted in the value 1152 being thrown into a register just before that branch… then if we input a T and get 1344 to load in a register we can almost certainly assume those numbers represent individual characters for our flag. So let’s test it:

[+] Enter Exatlon Password  : T

Breakpoint 1, 0x0000000000404d4a in main ()
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 RAX  0x7fffffffdf40 —▸ 0x7fffffffdf50 ◂— 0x2034343331 /* '1344 ' */
 RBX  0x200
 RCX  0x3131
 RDX  0x5
 RDI  0x7fffffffdf50 ◂— 0x2034343331 /* '1344 ' */
 RSI  0x54b4f5 ◂— '1344 1056 1968 1728 816 1648 784 1584 816 1728 1520 1840 1664 784 1632 1856 1520 1728 816 1632 1856 1520 784 1760 1840 1824 816 1584 1856 784 1776 1760 528 528 2000 '
 R8   0x7fffffffdf50 ◂— 0x2034343331 /* '1344 ' */

Okay, maybe we just got lucky… so let’s test that third character, B, to see if 1056 gets loaded… if it does then we know this is our flag.

[+] Enter Exatlon Password  : B

Breakpoint 1, 0x0000000000404d4a in main ()
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 RAX  0x7fffffffdf40 —▸ 0x7fffffffdf50 ◂— 0x2036353031 /* '1056 ' */

BINGO.

Decoding the Flag

So at this point, it’s pretty safe to assume that “1152 1344 1056 1968 1728 816 1648 784 1584 816 1728 1520 1840 1664 784 1632 1856 1520 1728 816 1632 1856 1520 784 1760 1840 1824 816 1584 1856 784 1776 1760 528 528 2000” is our flag. However, these numbers don’t really mean much. The ASCII table only has decimal values up to 125 for characters that would be in our flag… so we need to figure out what relationship these numbers have.

Fun with ASCII

From our handy-dandy ASCII Table, we know that H can be represented as the decimal value 72 OR the hexadecimal value 48. My first instinct is to try dividing that big number by these values to see what results:

  • 1152/72 = 16
  • 1152/48 = 24

Whole numbers are promising, but I’m not entirely convinced yet… let’s do each of the first four characters to try and see if there’s a pattern:

T (Decimal 84; Hex 54)

  • 1344/84 = 16
  • 1344/54 = 24.8888888889

B (Decimal 66; Hex 42)

  • 1056/66 = 16
  • 1056/42 = 25.1428571429

{ (Decimal 123; Hex 7B)

  • 1968/123 = 16
  • 1968/7B? Yeah… I think that rules this one out, we’re not going to be dividing two different bases.

Multiples of 16

Going by the pattern we’ve established, let’s write a simple Python script that will divide all these values by 16, then convert the resulting decimals to ASCII characters with the chr() function.

Something like this:

/images/htb/challenges/reversing/exatlon/rev.png

And there we have it.

Analyzing our Process

So let’s break down what we did here to try and define the methodology a little more concretely:

  1. Run the program to get a general idea of its purpose
  2. Gather information about the file (metadata, compilers, libraries, etc)
  3. Try and figure out the program’s logical behavior (conditional branching, loops, etc.) - You can do this visually in a GUI environment like IDA Pro, Ghidra, or Binary Ninja… or via commandline with something like GDB.
  4. Make a note of “important” instruction memory addresses and memory addresses of important variables
  5. Set breakpoints at key functions, and monitor register behavior.
  6. Tie what you see happening in the registers to what you understand about the program’s logical behavior
    • in this case, we were able to figure out that the list of numbers corresponded to individual characters.