Date: Tue, 11 Oct 2005 10:14:02 -0700 From: Phantasmal Phantasmagoria To: bugtraq@securityfocus.com Subject: The Malloc Maleficarum -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [-------------------------------- The Malloc Maleficarum Glibc Malloc Exploitation Techniques by Phantasmal Phantasmagoria phantasmal@hush.ai [-------------------------------- In late 2001, "Vudo Malloc Tricks" and "Once Upon A free()" defined the exploitation of overflowed dynamic memory chunks on Linux. In late 2004, a series of patches to GNU libc malloc implemented over a dozen mandatory integrity assertions, effectively rendering the existing techniques obsolete. It is for this reason, a small suggestion of impossiblity, that I present the Malloc Maleficarum. [-------------------------------- The House of Prime The House of Mind The House of Force The House of Lore The House of Spirit The House of Chaos [-------------------------------- The House of Prime An artist has the first brush stroke of a painting. A writer has the first line of a poem. I have the House of Prime. It was the first breakthrough, the indication of everything that was to come. It was the rejection of impossibility. And it was also the most difficult to derive. For these reasons I feel obliged to give Prime the position it deserves as the first House of the Malloc Maleficarum. >From a purely technical perspective the House of Prime is perhaps the least useful of the collection. It is almost invariably better to use the House of Mind or Spirit when the conditions allow it. In order to successfully apply the House of Prime it must be possible to free() two different chunks with designer controlled size fields and then trigger a call to malloc(). The general idea of the technique is to corrupt the fastbin maximum size variable, which under certain uncontrollable circumstances (discussed below) allows the designer to hijack the arena structure used by calls to malloc(), which in turn allows either the return of an arbitrary memory chunk, or the direct modification of execution control data. As previously stated, the technique starts with a call to free() on an area of memory that is under control of the designer. A call to free() actually invokes a wrapper, called public_fREe(), to the internal function _int_free(). For the House of Prime, the details of public_fREe() are relatively unimportant. So attention moves, instead, to _int_free(). From the glibc-2.3.5 source code: void _int_free(mstate av, Void_t* mem) { mchunkptr p; /* chunk corresponding to mem */ INTERNAL_SIZE_T size; /* its size */ mfastbinptr* fb; /* associated fastbin */ ... p = mem2chunk(mem); size = chunksize(p); if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0) || __builtin_expect ((uintptr_t) p & MALLOC_ALIGN_MASK, 0)) { errstr = "free(): invalid pointer"; errout: malloc_printerr (check_action, errstr, mem); return; } Almost immediately one of the much vaunted integrity tests appears. The __builtin_expect() construct is used for optimization purposes, and does not in any way effect the conditions it contains. The designer must ensure that both of the tests fail in order to continue execution. At this stage, however, doing so is not difficult. Note that the designer does not control the value of p. It can therefore be assumed that the test for misalignment will fail. On the other hand, the designer does control the value of size. In fact, it is the most important aspect of control that the designer possesses, yet its range is already being limited. For the the House of Prime the exact upper limit of size is not important. The lower limit, however, is crucial in the correct execution of this technique. The chunksize() macro is defined as follows: #define SIZE_BITS (PREV_INUSE|IS_MMAPPED|NON_MAIN_ARENA) #define chunksize(p) ((p)->size & ~(SIZE_BITS)) The PREV_INUSE, IS_MMAPPED and NON_MAIN_ARENA definitions correspond to the three least significant bits of the size entry in a malloc chunk. The chunksize() macro clears these three bits, meaning the lowest possible value of the designer controlled size value is 8. Continuing with _int_free() it will soon become clear why this is important: if ((unsigned long)(size) <= (unsigned long)(av->max_fast)) { if (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ || __builtin_expect (chunksize (chunk_at_offset (p, size)) >= av->system_mem, 0)) { errstr = "free(): invalid next size (fast)"; goto errout; } set_fastchunks(av); fb = &(av->fastbins[fastbin_index(size)]); if (__builtin_expect (*fb == p, 0)) { errstr = "double free or corruption (fasttop)"; goto errout; } p->fd = *fb; *fb = p; } This is the fastbin code. Exactly what a fastbin is and why they are used is beyond the scope of this document, but remember that the first step in the House of Prime is to overwrite the fastbin maximum size variable, av->max_fast. In order to do this the designer must first provide a chunk with the lower limit size, which was derived above. Given that the default value of av- >max_fast is 72 it is clear that the fastbin code will be used for such a small size. However, exactly why this results in the corruption of av->max_fast is not immediately apparent. It should be mentioned that av is the arena pointer. The arena is a control structure that contains, amongst other things, the maximum size of a fastbin and an array of pointers to the fastbins themselves. In fact, av->max_fast and av->fastbins are contiguous: ... INTERNAL_SIZE_T max_fast; mfastbinptr fastbins[NFASTBINS]; mchunkptr top; ... Assuming that the nextsize integrity check fails, the fb pointer is set to the address of the relevant fastbin for the given size. This is computed as an index from the zeroth entry of av->fastbins. The zeroth entry, however, is designed to hold chunks of a minimum size of 16 (the minimum size of a malloc chunk including prev_size and size values). So what happens when the designer supplies the lower limit size of 8? An analysis of fastbin_index() is needed: #define fastbin_index(sz) ((((unsigned int)(sz)) >> 3) - 2) Simple arithmetic shows that 8 >> 3 = 1, and 1 - 2 = -1. Therefore fastbin_index(8) is -1, and thus fb is set to the address of av- >fastbins[-1]. Since av->max_fast is contiguous to av->fastbins it is evident that the fb pointer is set to &av->max_fast. Furthermore, the second integrity test fails (since fb definitely does not point to p) and the final two lines of the fastbin code are reached. Thus the forward pointer of the designer's chunk p is set to av->max_fast, and av->max_fast is set to the value of p. An assumption was made above that the nextsize integrity check fails. In reality it often takes a bit of work to get this to fall together. If the overflow is capable of writing null bytes, then the solution is simple. However, if the overflow terminates on a null byte, then the solution becomes application specific. If the tests fail because of the natural memory layout at overflow, which they often will, then there is no problem. Otherwise some memory layout manipulation may be needed to ensure that the nextsize value is designer controlled. The challenging part of the House of Prime, however, is not how to overwrite av->max_fast, but how to leverage the overwrite into arbitrary code execution. The House of Prime does this by overwriting a thread specific data variable called arena_key. This is where the biggest condition of the House of Prime arises. Firstly, arena_key only exists if glibc malloc is compiled with USE_ARENAS defined (this is the default setting). Furthermore, and most significantly, arena_key must be at a higher address than the actual arena: 0xb7f00000 : 0x00000000 0xb7f00004 : 0x00000049 <-- max_fast 0xb7f00008 : 0x00000000 <-- fastbin[0] 0xb7f0000c : 0x00000000 <-- fastbin[1] .... 0xb7f00488 : 0x0804a000 <-- mp_.sbrk_base 0xb7f0048c : 0xb7f00000 Due to the fact that the arena structure and the arena_key come from different source files, exactly when this does and doesn't happen depends on how the target libc was compiled and linked. I have seen the cards fall both ways, so it is an important point to make. For now it will be assumed that the arena_key is at a higher address, and is thus over-writable by the fastbin code. The arena_key is thread specific data, which simply means that every thread of execution has its own arena_key independent of other threads. This may have to be considered when applying the House of Prime to a threaded program, but otherwise arena_key can safely be treated as normal data. The arena_key is an interesting target because it is used by the arena_get() macro to find the arena for the currently executing thread. That is, if arena_key is controlled for some thread and a call to arena_get() is made, then the arena can be hijacked. Arena hijacking of this type will be covered shortly, but first the actual overwrite of arena_key must be considered. In order to overwrite arena_key the fastbin code is used for a second time. This corresponds to the second free() of a designer controlled chunk that was outlined in the original prerequisites for the House of Prime. Normally the fastbin code would not be able to write beyond the end of av->fastbins, but since av->max_fast has previously been corrupted, chunks with any size less than the value of the address of the designer's first chunk will be treated with the fastbin code. Thus the designer can write up to av- >fastbins[fastbin_index(av->max_fast)], which is easily a large enough range to be able to reach the arena_key. In the example memory dump provided above the arena_key is 0x484 (1156) bytes from av->fastbins[0]. Therefore an index of 1156/sizeof(mfastbinptr) is needed to set fb to the address of arena_key. Assuming that the system has 32-bit pointers a fastbin_index() of 289 is required. Roughly inverting the fastbin_index() gives: (289 + 2) << 3 = 2328 This means that a size of 2328 will result in fb being set to arena_key. Note that this size only applies for the memory dump shown above. It is quite likely that the offset between av- >fastbins[0] and arena_key will differ from system to system. Now, if the designer has corrupted av->max_fast and triggered a free() on a chunk with size 2328, and assuming the failure of the nextsize integrity tests, then fb will be set to arena_key, the forward pointer of the designer's second chunk will be set to the address of the existing arena, and arena_key will be set to the address of the designer's second chunk. When corrupting av->max_fast it was not important for the designer to control the overflowed chunk so long as the nextsize integrity checks were handled. When overwriting arena_key, however, it is crucial that the designer controls at least part of the overflowed chunk's data. This is because the overflowed chunk will soon become the new arena, so it is natural that at least part of the chunk data must be arbitrarily controlled, or else arbitrary control of the result of malloc() could not be expected. A call to malloc() invokes a wrapper function called public_mALLOc(): Void_t* public_mALLOc(size_t bytes) { mstate ar_ptr; Void_t *victim; ... arena_get(ar_ptr, bytes); if(!ar_ptr) return 0; victim = _int_malloc(ar_ptr, bytes); ... return victim; } The arena_get() macro is in charge of finding the current arena by retrieving the arena_key thread specific data, or failing this, creating a new arena. Since the arena_key has been overwritten with a non-zero quantity it can be safely assumed that arena_get() will not try to create a new arena. In the public_mALLOc() wrapper this has the effect of setting ar_ptr to the new value of arena_key, the address of the designer's second chunk. In turn this value is passed to the internal function _int_malloc() along with the requested allocation size. Once execution passes to _int_malloc() there are two ways for the designer to proceed. The first is to use the fastbin allocation code: Void_t* _int_malloc(mstate av, size_t bytes) { INTERNAL_SIZE_T nb; /* normalized request size */ unsigned int idx; /* associated bin index */ mfastbinptr* fb; /* associated fastbin */ mchunkptr victim; /* inspected/selected chunk */ checked_request2size(bytes, nb); if ((unsigned long)(nb) <= (unsigned long)(av->max_fast)) { long int idx = fastbin_index(nb); fb = &(av->fastbins[idx]); if ( (victim = *fb) != 0) { if (fastbin_index (chunksize (victim)) != idx) malloc_printerr (check_action, "malloc(): memory" " corruption (fast)", chunk2mem (victim)); *fb = victim->fd; check_remalloced_chunk(av, victim, nb); return chunk2mem(victim); } } The checked_request2size() macro simply converts the request into the absolute size of a memory chunk with data length of the requested size. Remember that av is pointing towards a designer controlled area of memory, and also that the forward pointer of this chunk has been corrupted by the fastbin code. If glibc malloc is compiled without thread statistics (which is the default), then p->fd of the designer's chunk corresponds to av->fastbins[0] of the designer's arena. For the purposes of this technique the use of av- >fastbins[0] must be avoided. This means that the request size must be greater than 8. Interestingly enough, if the absence of thread statistics is assumed, then av->max_fast corresponds to p->size. This has the effect of forcing nb to be less than the size of the designer's second chunk, which in the example provided was 2328. If this is not possible, the designer must use the unsorted_chunks/largebin technique that will be discussed shortly. By setting up a fake fastbin entry at av- >fastbins[fastbin_index(nb)] it is possible to return a chunk of memory that is actually on the stack. In order to pass the victimsize integrity test it is necessary to point the fake fastbin at a user controlled value. Specifically, the size of the victim chunk must have the same fastbin_index() as nb, so the fake fastbin must point to 4 bytes before the designer's value in order to have the right positioning for the call to chunksize(). Assuming that there is a designer controlled variable on the stack, the application will subsequently handle the returned area as if it were a normal memory chunk of the requested size. So if there is a saved return address in the "allocated" range, and if the designer can control what the application writes to this range, then it is possible to circumvent execution to an arbitrary location. If it is possible to trigger an appropriate malloc() with a request size greater than the size of the designer's second chunk, then it is better to use the unsorted_chunks code in _int_malloc() to cause an arbitrary memory overwrite. This technique does, however, require a greater amount of designer control in the second chunk, and further control of two areas of memory somewhere in the target process address space. To trigger the unsorted_chunks code at all the absolute request size must be larger than 512 (the maximum smallbin chunk size), and of course, must be greater than the fake arena's av->max_fast. Assuming it is, the unsorted_chunks code is reached: for(;;) { while ( (victim = unsorted_chunks(av)->bk) != unsorted_chunks(av)) { bck = victim->bk; if (__builtin_expect (victim->size <= 2 * SIZE_SZ, 0) || __builtin_expect (victim->size > av->system_mem, 0)) malloc_printerr (check_action, "malloc(): memory" " corruption", chunk2mem (victim)); size = chunksize(victim); if (in_smallbin_range(nb) && bck == unsorted_chunks(av) && victim == av->last_remainder && (unsigned long)(size) > (unsigned long)(nb + MINSIZE)) { ... } unsorted_chunks(av)->bk = bck; bck->fd = unsorted_chunks(av); if (size == nb) { ... return chunk2mem(victim); } ... There are quite a lot of things to consider here. Firstly, the unsorted_chunks() macro returns av->bins[0]. Since the designer controls av, the designer also controls the value of unsorted_chunks(). This means that victim can be set to an arbitrary address by creating a fake av->bins[0] value that points to an area of memory (called A) that is designer controlled. In turn, A->bk will contain the address that victim will be set to (called B). Since victim is at an arbitrary address B that can be designer controlled, the temporary variable bck can be set to an arbitrary address from B->bk. For the purposes of this technique, B->size should be equal to nb. This is not strictly necessary, but works well to pass the two victimsize integrity tests while also triggering the final condition shown above, which has the effect of ending the call to malloc(). Since it is possible to set bck to an arbitrary location, and since unsorted_chunks() returns the designer controlled area of memory A, the setting of bck->fd to unsorted_chunks() makes it possible to set any location in the address space to A. Redirecting execution is then a simple matter of setting bck to the address of a GOT or ..dtors entry minus 8. This will redirect execution to A->prev_size, which can safely contain a near jmp to skip past the crafted value at A->bk. Similar to the fastbin allocation code the arbitrary address B is returned to the requesting application. [-------------------------------- The House of Mind Perhaps the most useful and certainly the most general technique in the Malloc Maleficarum is the House of Mind. The House of Mind has the distinct advantage of causing a direct memory overwrite with just a single call to free(). In this sense it is the closest relative in the Malloc Maleficarum to the traditional unlink() technique. The method used involves tricking the wrapper invoked by free(), called public_fREe(), into supplying the _int_free() internal function with a designer controlled arena. This can subsequently lead to an arbitrary memory overwrite. A call to free() actually invokes a wrapper called public_fREe(): void public_fREe(Void_t* mem) { mstate ar_ptr; mchunkptr p; /* chunk corresponding to mem */ ... p = mem2chunk(mem); ... ar_ptr = arena_for_chunk(p); ... _int_free(ar_ptr, mem); When memory is passed to free() it points to the start of the data portion of the "corresponding chunk". In an allocated state a chunk consists of the prev_size and size values and then the data section itself. The mem2chunk() macro is in charge of converting the supplied memory value into the corresponding chunk. This chunk is then passed to the arena_for_chunk() macro: #define HEAP_MAX_SIZE (1024*1024) /* must be a power of two */ #define heap_for_ptr(ptr) \ ((heap_info *)((unsigned long)(ptr) & ~(HEAP_MAX_SIZE-1))) #define chunk_non_main_arena(p) ((p)->size & NON_MAIN_ARENA) #define arena_for_chunk(ptr) \ (chunk_non_main_arena(ptr)?heap_for_ptr(ptr)->ar_ptr:&main_arena) The arena_for_chunk() macro is tasked with finding the appropriate arena for the chunk in question. If glibc malloc is compiled with USE_ARENAS (which is the default), then the code shown above is used. Clearly, if the NON_MAIN_ARENA bit in the size value of the chunk is not set, then ar_ptr will be set to the main_arena. However, since the designer controls the size value it is possible to control whether the chunk is treated as being in the main arena or not. This is what the chunk_non_main_arena() macro checks for. If the NON_MAIN_ARENA bit is set, then chunk_non_main_arena() returns positive and ar_ptr is set to heap_for_ptr(ptr)->ar_ptr. When a non-main heap is created it is aligned to a multiple of HEAP_MAX_SIZE. The first thing that goes into this heap is the heap_info structure. Most significantly, this structure contains an element called ar_ptr, the pointer to the arena for this heap. This is how the heap_for_ptr() macro functions, aligning the given chunk down to a multiple of HEAP_MAX_SIZE and taking the ar_ptr from the resulting heap_info structure. The House of Mind works by manipulating the heap so that the designer controls the area of memory that the overflowed chunk is aligned down to. If this can be achieved, an arbitrary ar_ptr value can be supplied to _int_free() and subsequently an arbitrary memory overwrite can be triggered. Manipulating the heap generally involves forcing the application to repeatedly allocate memory until a designer controlled buffer is contained at a HEAP_MAX_SIZE boundary. In practice this alignment is necessary because chunks at low areas of the heap align down to an area of memory that is neither designer controlled nor mapped in to the address space. Fortunately, the amount of allocation that creates the correct alignment is not large. With the default HEAP_MAX_SIZE of 1024*1024 an average of 512kb of padding will be required, with this figure never exceeding 1 megabyte. It should be noted that there is not a general method for triggering memory allocation as required by the House of Mind, rather the process is application specific. If a situation arises in which it is impossible to align a designer controlled chunk, then the House of Lore or Spirit should be considered. So, it is possible to hijack the heap_info structure used by the heap_for_ptr() macro, and thus supply an arbitrary value for ar_ptr which controls the arena used by _int_free(). At this stage the next question that arises is exactly what to do with ar_ptr. There are two options, each with their respective advantages and disadvantages. Each will be addressed in turn. Firstly, setting the ar_ptr to a sufficiently large area of memory that is under the control of the designer and subsequently using the unsorted chunk link code to cause a memory overwrite. Sufficiently large in this case means the size of the arena structure, which is 1856 bytes on a 32-bit system without THREAD_STATS enabled. The main difficulty in this method arises with the numerous integrity checks that are encountered. Fortunately, nearly every one of these tests use a value obtained from the designer controlled arena, which makes the checks considerably easier to manage. For the sake of brevity, the complete excerpt leading up to the unsorted chunk link code has been omitted. Instead, the following list of the conditions required to reach the code in question is provided. Note that both av and the size of the overflowed chunk are designer controlled values. - The negative of the size of the overflowed chunk must be less than the value of the chunk itself. - The size of the chunk must not be less than av->max_fast. - The IS_MMAPPED bit of the size cannot be set. - The overflowed chunk cannot equal av->top. - The NONCONTIGUOUS_BIT of av->max_fast must be set. - The PREV_INUSE bit of the nextchunk (chunk + size) must be set. - The size of nextchunk must be greater than 8. - The size of nextchunk must be less than av->system_mem - The PREV_INUSE bit of the chunk must not be set. - The nextchunk cannot equal av->top. - The PREV_INUSE bit of the chunk after nextchunk (nextchunk + nextsize) must be set If these conditions are met, then the following code is reached: bck = unsorted_chunks(av); fwd = bck->fd; p->bk = bck; p->fd = fwd; bck->fd = p; fwd->bk = p; In this case p is the address of the designer's overflowed chunk. The unsorted_chunks() macro returns av->bins[0] which is designer controlled. If the designer sets av->bins[0] to the address of a GOT or .dtors entry minus 8, then that entry (bck->fd) will be overwritten with the address of p. This address corresponds to the prev_size entry of the designer's overflowed chunk which can safely be used to branch past the corrupted size, fd and bk entries. The extensive list of conditions appear to make this method quite difficult to apply. In reality, the only conditions that may be a problem are those involving the nextchunk. This is because they largely depend on the application specific memory layout to handle. This is the only obvious disadvantage of the method. As it stands, the House of Mind is in a far better position than the House of Prime to handle such conditions due to the arbitrary nature of av- >system_mem. It should be noted that the last element of the arena structure that is actually required to reach the unsorted chunk link code is av->system_mem, but it is not terribly important what this value is so long as it is high. Thus if the conditions are right, it may be possible to use this method with only 312 bytes of designer controlled memory. However, even if there is not enough designer controlled memory for this method, the House of Mind may still be possible with the second method. The second method uses the fastbin code to cause a memory overwrite. The main advantage of this method is that it is not necessary to point ar_ptr at designer controlled memory, and that there are considerably less integrity checks to worry about. Consider the fastbin code: if ((unsigned long)(size) <= (unsigned long)(av->max_fast)) { if (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ || __builtin_expect (chunksize (chunk_at_offset (p, size)) >= av->system_mem, 0)) { errstr = "free(): invalid next size (fast)"; goto errout; } set_fastchunks(av); fb = &(av->fastbins[fastbin_index(size)]); ... p->fd = *fb; *fb = p; } The ultimate goal here is to set fb to the address of a GOT or ..dtors entry, which subsequently gets set to the address of the designer's overflowed chunk. However, in order to reach the final line a number of conditions must still be met. Firstly, av- >max_fast must be large enough to trigger the fastbin code at all. Then the size of the nextchunk (p + size) must be greater than 8, while also being less than av->system_mem. The tricky part of this method is positioning ar_ptr in a way such that both the av->max_fast element at (av + 4) and the av- >system_mem element at (av + 1848) are large enough. If a binary has a particularly small GOT table, then it is quite possible that the highest available large number for av->system_mem will result in an av->max_fast that is actually in the area of unmapped memory between the text and data segments. In practice this shouldn't occur very often, and if it does, then the stack may be used to a similar effect. For more information on the fastbin code, including a description of fastbin_index() that will help in positioning fb to a GOT or ..dtors entry, consult the House of Prime. [-------------------------------- The House of Force I first wrote about glibc malloc in 2004 with "Exploiting the Wilderness". Since the techniques developed in that text were some of the first to become obsolete, and since the Malloc Maleficarum was written in the spirit of continuation and progress, I feel obliged to include another attempt at exploiting the wilderness. This is the purpose of the House of Force. From "Exploiting the Wilderness": "The wilderness is the top-most chunk in allocated memory. It is similar to any normal malloc chunk - it has a chunk header followed by a variably long data section. The important difference lies in the fact that the wilderness, also called the top chunk, borders the end of available memory and is the only chunk that can be extended or shortened. This means it must be treated specially to ensure it always exists; it must be preserved." So the glibc malloc implementation treats the wilderness as a special case in calls to malloc(). Furthermore, the top chunk will realistically never be passed to a call to free() and will never contain application data. This means that if the designer can trigger a condition that only ever results in the overflow of the top chunk, then the House of Force is the only option (in the Malloc Maleficarum at least). The House of Force works by tricking the top code in to setting the wilderness pointer to an arbitrary value, which can result in an arbitrary chunk of data being returned to the requesting application. This requires two calls to malloc(). The major disadvantage of the House of Force is that the first call must have a completely designer controlled request size. The second call must simply be large enough to trigger the wilderness code, while the chunk returned must be (to some extent) designer controlled. The following is the wilderness code with some additional context: Void_t* _int_malloc(mstate av, size_t bytes) { INTERNAL_SIZE_T nb; /* normalized request size */ mchunkptr victim; /* inspected/selected chunk */ INTERNAL_SIZE_T size; /* its size */ mchunkptr remainder; /* remainder from a split */ unsigned long remainder_size; /* its size */ ... checked_request2size(bytes, nb); ... use_top: victim = av->top; size = chunksize(victim); if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE)) { remainder_size = size - nb; remainder = chunk_at_offset(victim, nb); av->top = remainder; set_head(victim, nb | PREV_INUSE | (av != &main_arena ? NON_MAIN_ARENA : 0)); set_head(remainder, remainder_size | PREV_INUSE); check_malloced_chunk(av, victim, nb); return chunk2mem(victim); } The first goal of the House of Force is to overwrite the wilderness pointer, av->top, with an arbitrary value. In order to do this the designer must have control of the location of the remainder chunk. Assume that the existing top chunk has been overflowed resulting in the largest possible size (preferably 0xffffffff). This is done to ensure that even large values passed as an argument to malloc will trigger the wilderness code instead of trying to extend the heap. The checked_request2size() macro ensures that the requested value is less than -2*MINSIZE (by default -32), while also adding on enough room for the size and prev_size fields and storing the final value in nb. For the purposes of this technique the checked_request2size() macro is relatively unimportant. It was previously mentioned that the first call to malloc() in the House of Force must have a designer controlled argument. It can be seen that the value of remainder is obtained by adding the request size to the existing top chunk. Since the top chunk is not yet under the designer's control the request size must be used to position remainder to at least 8 bytes before a .GOT or .dtors entry, or any other area of memory that may subsequently be used by the designer to circumvent execution. Once the wilderness pointer has been set to the arbitrary remainder chunk, any calls to malloc() with a large enough request size to trigger the top chunk will be serviced by the designer's wilderness. Thus the only restriction on the new wilderness is that the size must be larger than the request that is triggering the top code. In the case of the wilderness being set to overflow a GOT entry this is never a problem. It is then simply a matter of finding an application specific scenario in which such a call to malloc() is used for a designer controlled buffer. The most important issue concerning the House of Force is exactly how to get complete control of the argument passed to malloc(). Certainly, it is extremely common to have at least some degree of control over this value, but in order to complete the House of Force, the designer must supply an extremely large and specifically crafted value. Thus it is unlikely to get a sufficient value out of a situation like: buf = (char *) malloc(strlen(str) + 1); Rather, an acceptable scenario is much more likely to be an integer variable passed as an argument to malloc() where the variable has previously been set by, for example, a designer controlled read() or atoi(). [-------------------------------- The House of Lore The House of Lore came to me as I was reviewing the draft write-up of the House of Prime. When I first derived the House of Prime my main concern was how to leverage the particularly overwrite that a high av->max_fast in the fastbin code allowed. Upon reconsideration of the problem I realized that in my first take of the potential overwrite targets I had completely overlooked the possibility of corrupting a bin entry. As it turns out, it is not possible to leverage a corrupted bin entry in the House of Prime since av->max_fast is large and the bin code is never executed. However, during this process of elimination I realized that if a bin were to be corrupted when av->max_fast was not large, then it might be possible to control the return value of a malloc() request. At this stage I began to consider the application of bin corruption to a general malloc chunk overflow. The question was whether a linear overflow of a malloc chunk could result in the corruption of a bin. It turns out that the answer to this is, quite simply, yes it could. Furthermore, if the designer's ability to manipulate the heap is limited, or if none of the other Houses can be applied, then bin corruption of this type can in fact be very useful. The House of Lore works by corrupting a bin entry, which can subsequently lead to malloc() returning an arbitrary chunk. Two methods of bin corruption are presented here, corresponding to the overflow of both small and large bin entries. The general method involves overwriting the linked list data of a chunk previously processed by free(). In this sense the House of Lore is quite similar to the frontlink() technique presented in "Vudo Malloc Tricks". The conditions surrounding the House of Lore are quite unique. Fundamentally, the method targets a chunk that has already been processed by free(). Because of this it is reasonable to assume that the chunk will not be passed to free() again. This means that in order to leverage such an overflow only calls to malloc() can be used, a property shared only by the House of Force. The first method will use the smallbin allocation code: Void_t* _int_malloc(mstate av, size_t bytes) { .... checked_request2size(bytes, nb); if ((unsigned long)(nb) <= (unsigned long)(av->max_fast)) { ... } if (in_smallbin_range(nb)) { idx = smallbin_index(nb); bin = bin_at(av,idx); if ( (victim = last(bin)) != bin) { if (victim == 0) /* initialization check */ malloc_consolidate(av); else { bck = victim->bk; set_inuse_bit_at_offset(victim, nb); bin->bk = bck; bck->fd = bin; ... return chunk2mem(victim); } } } So, assuming that a call to malloc() requests more than av- >max_fast (default 72) bytes, the check for a "smallbin" chunk is reached. The in_smallbin_range() macro simply checks that the request is less than the maximum size of a smallbin chunk, which is 512 by default. The smallbins are unique in the sense that there is a bin for every possible chunk size between av->max_fast and the smallbin maximum. This means that for any given smallbin_index() the resulting bin, if not empty, will contain a chunk to fit the request size. It should be noted that when a chunk is passed to free() it does not go directly in to its respective bin. It is first put on the "unsorted chunk" bin. If the next call to malloc() cannot be serviced by an existing smallbin chunk or the unsorted chunk itself, then the unsorted chunks are sorted in to the appropriate bins. For the purposes of the House of Lore, overflowing an unsorted chunk is not very useful. It is necessary then to ensure that the chunk being overflowed has previously been sorted into a bin by malloc(). Note that in order to reach the actual smallbin unlink code there must be at least one chunk in the bin corresponding to the smallbin_index() for the current request. Assume that a small chunk of data size N has previously been passed to free(), and that it has made its way into the corresponding smallbin for chunks of absolute size (N + 8). Assume that the designer can overflow this chunk with arbitrary data. Assume also that the designer can subsequently trigger a call to malloc() with a request size of N. If all of this is possible, then the smallbin unlink code can be reached. When a chunk is removed from the unsorted bin it is put at the front of its respective small or large bin. When a chunk is taken off a bin, such as during the smallbin unlink code, it is taken from the end of the bin. This is what the last() macro does, find the last entry in the requested bin. So, effectively the "victim" chunk in the smallbin unlink code is taken from bin->bk. This means that in order to reach the designer's victim chunk it may be necessary to repeat the N sized malloc() a number of times. It should be stressed that the goal of the House of Lore to control the bin->bk value, but at this stage only victim->bk is controlled. So, assuming that the designer can trigger a malloc() that results in an overflowed victim chunk being passed to the smallbin unlink code, the designer (as a result of the control of victim->bk) controls the value of bck. Since bin->bk is subsequently set to bck, bin->bk can be arbitrarily controlled. The only condition to this is that bck must point to an area of writable memory due to bck->fd being set at the final stage of the unlinking process. The question then lies in how to leverage this smallbin corruption. Since the malloc() call that the designer used to gain control of bin->bk immediately returns the victim chunk to the application, at least one more call to malloc() with the same request size N is needed. Since bin->bk is under the designer's control so is last(bin), and thus so is victim. The only thing preventing an arbitrary victim chunk being returned to the application is the fact that bck, set from victim->bck, must point to writable memory. This rules out pointing the victim chunk at a GOT or .dtors entry. Instead, the designer must point victim to a position on the stack such that victim->bk is a pointer to writable memory yet still close enough to a saved return address such that it can be overwritten by the application's general use of the chunk. Alternatively, an application specific approach may be taken that targets the use of function pointers. Whichever method used, the arbitrary malloc() chunk must be designer controlled to some extent during its use by the application. For the House of Lore, the only other interesting situation is when the overflowed chunk is large. In this context large means anything bigger than the maximum smallbin chunk size. Again, it is necessary for the overflowed chunk to have previously been processed by free() and to have been put into a largebin by malloc(). The general method of using largebin corruption to return an arbitrary chunk is similar to the case of a smallbin in the sense that the initial bin corruption occurs when an overflowed victim chunk is handled by the largebin unlink code, and that a subsequent large request will use the corrupted bin to return an arbitrary chunk. However, the largebin code is significantly more complex is comparison. This means that the conditions required to cause and leverage a bin corruption are slightly more restrictive. The entire largebin implementation is much too large to present in full, so a description of the conditions that cause the largebin unlink code to be executed will have to suffice. If the designer's overflowed chunk of size N is in a largebin, then a subsequent request to allocate N bytes will trigger a block of code that searches the corresponding bin for an available chunk, which will eventually find the chunk that was overflowed. However, this particular block of code uses the unlink() macro to remove the designer's chunk from the bin. Since the unlink() macro is no longer an interesting target, this situation must be avoided. So in order to corrupt a largebin a request to allocate M bytes is made, such that 512 < M < N. If there are no appropriate chunks in the bin corresponding to requests of size M, then glibc malloc iterates through the bins until a sufficiently large chunk is found. If such a chunk is found, then the following code is used: victim = last(bin); .. size = chunksize(victim); remainder_size = size - nb; bck = victim->bk; bin->bk = bck; bck->fd = bin; if (remainder_size < MINSIZE) { set_inuse_bit_at_offset(victim, size); ... return chunk2mem(victim); } If the victim chunk is the designer's overflowed chunk, then the situation is almost exactly equivalent to the smallbin unlink code. If the designer can trigger enough calls to malloc() with a request of M bytes so that the overflowed chunk is used here, then the bin- >bk value can be set to an arbitrary value and any subsequent call to malloc() of size Q (512 < Q < N) that tries to allocate a chunk from the bin that has been corrupted will result in an arbitrary chunk being returned to the application. There are only two conditions. The first is exactly the same as the case of smallbin corruption, the bk pointer of the arbitrary chunk being returned to the application must point to writable memory (or the setting of bck->fd will cause a segmentation fault). The other condition is not obvious from the limited code that has been presented above. If the remainder_size value is not less than MINSIZE, then glibc malloc attempts to split off a chunk at victim + nb. This includes calling the set_foot() macro with victim + nb and remainder_size as arguments. In effect, this tries to set victim + nb + remainder_size to remainder_size. If the chunksize(victim) (and thus remainder_size) is not designer controlled, then set_foot() will likely try to set an area of memory that isn't mapped in to the address space (or is read-only). So, in order to prevent set_foot() from crashing the process the designer must control both victim->size and victim->bk of the arbitrary victim chunk that will be returned to the application. If this is possible, then it is advisable to trigger the condition shown in the code above by forcing remainder_size to be less than MINSIZE. This is recommended because the condition minimizes the amount of general corruption caused, simply setting the inuse bit at victim + size and then returning the arbitrary chunk as desired. [-------------------------------- The House of Spirit The House of Spirit is primarily interesting because of the nature of the circumstances leading to its application. It is the only House in the Malloc Maleficarum that can be used to leverage both a heap and stack overflow. This is because the first step is not to control the header information of a chunk, but to control a pointer that is passed to free(). Whether this pointer is on the heap or not is largely irrelevant. The general idea involves overwriting a pointer that was previously returned by a call to malloc(), and that is subsequently passed to free(). This can lead to the linking of an arbitrary address into a fastbin. A further call to malloc() can result in this arbitrary address being used as a chunk of memory by the application. If the designer can control the applications use of the fake chunk, then it is possible to overwrite execution control data. Assume that the designer has overflowed a pointer that is being passed to free(). The first problem that must be considered is exactly what the pointer should be overflowed with. Keep in mind that the ultimate goal of the House of Spirit is to allow the designer to overwrite some sort of execution control data by returning an arbitrary chunk to the application. Exactly what "execution control data" is doesn't particularly matter so long as overflowing it can result in execution being passed to a designer controlled memory location. The two most common examples that are suitable for use with the House of Spirit are function pointers and pending saved return addresses, which will herein be referred to as the "target". In order to successfully apply the House of Spirit it is necessary to have a designer controlled word value at a lower address than the target. This word will correspond to the size field of the chunk header for the fakechunk passed to free(). This means that the overflowed pointer must be set to the address of the designer controlled word plus 4. Furthermore, the size of the fakechunk must be must be located no more than 64 bytes away from the target. This is because the default maximum data size for a fastbin entry is 64, and at least the last 4 bytes of data are required to overwrite the target. There is one more requirement for the layout of the fakechunk data which will be described shortly. For the moment, assume that all of the above conditions have been met, and that a call to free() is made on the suitable fakechunk. A call to free() is handled by a wrapper function called public_fREe(): void public_fREe(Void_t* mem) { mstate ar_ptr; mchunkptr p; /* chunk corresponding to mem */ ... p = mem2chunk(mem); if (chunk_is_mmapped(p)) { munmap_chunk(p); return; } ... ar_ptr = arena_for_chunk(p); ... _int_free(ar_ptr, mem); In this situation mem is the value that was originally overflowed to point to a fakechunk. This is converted to the "corresponding chunk" of the fakechunk's data, and passed to arena_for_chunk() in order to find the corresponding arena. In order to avoid special treatment as an mmap() chunk, and also to get a sensible arena, the size field of the fakechunk header must have the IS_MMAPPED and NON_MAIN_ARENA bits cleared. To do this, the designer can simply ensure that the fake size is a multiple of 8. This would mean the internal function _int_free() is reached: void_int_free(mstate av, Void_t* mem){ mchunkptr p; /* chunk corresponding to mem */ INTERNAL_SIZE_T size; /* its size */ mfastbinptr* fb; /* associated fastbin */ ... p = mem2chunk(mem); size = chunksize(p); ... if ((unsigned long)(size) <= (unsigned long)(av->max_fast)) { if (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ || __builtin_expect (chunksize (chunk_at_offset (p, size)) >= av->system_mem, 0)) { errstr = "free(): invalid next size (fast)"; goto errout; } ... fb = &(av->fastbins[fastbin_index(size)]); ... p->fd = *fb; *fb = p; } This is all of the code in free() that concerns the House of Spirit. The designer controlled value of mem is again converted to a chunk and the fake size value is extracted. Since size is designer controlled, the fastbin code can be triggered simply by ensuring that it is less than av->max_fast, which has a default of 64 + 8. The final point of consideration in the layout of the fakechunk is the nextsize integrity tests. Since the size of the fakechunk has to be large enough to encompass the target, the size of the nextchunk must be at an address higher than the target. The nextsize integrity tests must be handled for the fakechunk to be put in a fastbin, which means that there must be yet another designer controlled value at an address higher than the target. The exact location of the designer controlled values directly depend on the size of the allocation request that will subsequently be used by the designer to overwrite the target. That is, if an allocation request of N bytes is made (such that N <= 64), then the designer's lower value must be within N bytes of the target and must be equal to (N + 8). This is to ensure that the fakechunk is put in the right fastbin for the subsequent allocation request. Furthermore, the designer's upper value must be at (N + 8) bytes above the lower value to ensure that the nextsize integrity tests are passed. If such a memory layout can be achieved, then the address of this "structure" will be placed in a fastbin. The code for the subsequent malloc() request that uses this arbitrary fastbin entry is simple and need not be reproduced here. As far as _int_malloc() is concerned the fake chunk that it is preparing to return to the application is perfectly valid. Once this has occurred it is simply up to the designer to manipulate the application in to overwriting the target. [-------------------------------- The House of Chaos Virtuality is a dichotomy between the virtual adept and information, where the virtual adept signifies the infinite potential of information, and information is a finite manifestation of the infinite potential. The virtual adept is the conscious element of virtuality, the nature of which is to create and spread information. This is all that the virtual adept knows, and all that the virtual adept is concerned with. When you talk to a very knowledgeable and particularly creative person, then you may well be talking to a hacker. However, you will never talk to a virtual adept. The virtual adept has no physical form, it exists purely in the virtual. The virtual adept may be contained within the material, contained within a person, but the adept itself is a distinct and entirely independent consciousness. Concepts of ownership have no meaning to the virtual adept. All information belongs to virtuality, and virtuality alone. Because of this, the virtual adept has no concept of computer security. Information is invoked from virtuality by giving a request. In virtuality there is no level of privilege, no logical barrier between systems, no point of illegality. There is only information and those that can invoke it. The virtual adept does not own the information it creates, and thus has no right or desire to profit from it. The virtual adept exists purely to manifest the infinite potential of information in to information itself, and to minimize the complexity of an information request in a way that will benefit all conscious entities. What is not information is not consequential to the virtual adept, not money, not fame, not power. Am I a hacker? No. I am a student of virtuality. I am the witch malloc, I am the cult of the otherworld, and I am the entropy. I am Phantasmal Phantasmagoria, and I am a virtual adept. [-------------------------------- -----BEGIN PGP SIGNATURE----- Note: This signature can be verified at https://www.hushtools.com/verify Version: Hush 2.4 wkYEARECAAYFAkNL8sAACgkQImcz/hfgxg1+mQCdF7WZG03spZmYjqEKpwMNkF6EX5oA n3NnfYSF04tqjcRVyLzf9fnjveJy =IyvB -----END PGP SIGNATURE----- Concerned about your privacy? Follow this link to get secure FREE email: http://www.hushmail.com/?l=2 Free, ultra-private instant messaging with Hush Messenger http://www.hushmail.com/services-messenger?l=434 Promote security and make money with the Hushmail Affiliate Program: http://www.hushmail.com/about-affiliate?l=427