[math-fun] HAKMEM's missing entry for "ROP"
"Tails You Lose" What I'm going to describe is about 10 years old, but may be considerably older (please let me know!). Consider the plight of a virus in a biological system. The virus has to make copies of itself, but it can't afford to carry around the complete set of instruction bits necessary to describe to the host organism how to do this. In particular, the virus has to tell the host organism how to make various proteins in various sequences to force the host to build copies of the virus, and the virus doesn't want to have to describe every single needed protein within the virus's own DNA/RNA. To reduce the number of bits that the virus has to carry, the virus relies on intimate knowledge of the existing host's own DNA & RNA, so that it can "repurpose" various DNA & RNA subsequences for its own uses. In many cases, it won't even matter whether the subsequences within the host's DNA & RNA are "aligned" properly or not; the virus will only care that the subsequences it needs already exist within the host's DNA & RNA. The virus is thus performing a kind of "rsync" algorithm, which was originally developed to save bandwidth during the transmission of information from one computer to another by taking advantage of subsequences of bits that already existed within the receiving computer. Rsynch only transmits the "instructions" for constructing the intended sequence from bits and subsequence pieces already present on the receiving computer. http://en.wikipedia.org/wiki/Rsync Thus, if most of the bits the virus already requires are already present (perhaps in pieces) as subsequences of the host's DNA & RNA, the virus only has to contain the missing bits, together with enough information to assemble the complete sequences from these subsequence bits and pieces. It should be clear that the virus has traded off ease of transmission for specificity of hosts, since only those hosts that have its "missing" bits will be capable of reproducing the virus. A similar situation exists for computer viruses today. While memory capacity and transmission bandwidth of today's computers are very generous, there are severe limitations on the types of sequences that are allowed to execute. In particular, modern (and not-so-modern!) computers typically have read/write/execute permissions on pages in memory. If a page is designated "rx" (read/execute), it can be executed, while if a page is designated as "rw", it can be read and written but not executed. Modern computers (for a huge number of reasons), do not allow write access to executable pages, and it is quite difficult to change the permission status of a page from writable to executable. Similarly, (and for many of the same reasons) these computers do not allow "instructions" on writable pages to be executed. This page permission structure would seem to eliminate the possibility of executing arbitrary tasks not already present in the non-writable executable pages, but this is incorrect! Modern computer viruses can do the same thing that biological viruses can do: find _existing_ bit sequences & repurpose them for the virus's own computation. While these existing bit sequences may not be the most efficient or elegant means to achieve the virus's ends, it doesn't matter, because the virus's life & progeny depend upon these possibly-clumsy mechanisms. Step 1. Find useful subsequences already extant on the host. These subsequences will be called "widgets". In the case of computer programs, we need to find usable sequences of bits already extant _within executable pages_ that can be used to construct Turing-complete computation. This is due to the limitation that only instruction sequences on executable pages are allowed to be executed. Given the multi-gigabyte size and complexity of modern operating systems, this is actually a quite easy task, because we're approximating the "infinite monkey theorem", wherein enough (Seattle-based) monkeys with computer consoles can construct every sequence of instructions. http://en.wikipedia.org/wiki/Infinite_monkey_theorem For our usable widgets, we're going to be looking for sequences of instructions that end in a "return" instruction, because we will need this "return" instruction to link these subsequence widgets together in the fashion of "threaded code interpreters". http://en.wikipedia.org/wiki/Threaded_code Some of the best sources of "return-rich" code sequences exist in "shared libraries". Because these are libraries, they tend to be collections of subroutines, each of which terminates in a "return" instruction, and since these are _shared_ they reside on read-only, _executable_ memory pages. Furthermore, since these are libraries, they tend to be quite constant for many, many years, and hence can be relied upon to contain constant sequences. Note that _complete_ library subroutines are less interesting for the virus's purposes, than are _tails_ of library subroutines. The main entry point of a subroutine expects arguments in standard locations, but by entering this instruction sequence further down near the end, the subroutine is more likely to be utilizing values in non-standard locations (including memory locations). Step 2. Use a Turing-complete set of widgets as a new "instruction set" for a "threaded code interpreter". The typical widget consists of a number of instructions which manipulate memory and/or registers, followed by a return instruction which pops an address from the stack and jumps to this address to resume execution. A return instruction that returns to an address stored in a register is just as useful. Due to the restriction that execution is only possible when instructions are fetched from "executable" memory pages, our virus must guarantee that all such "return" addresses fall within these executable memory pages. There are several ways to implement if-then-else with these widgets. One can adjust the stack pointer by a variable amount to achieve the effect of a "skip" instruction. One can also execute both arms of a conditional and arrange to use only the result of the correct branch. Thus, our virus fashions a Forth/Postscript-like stack machine out of these widgets which loads arguments into registers or onto the stack, loads the address of the next widget onto the stack, loads the address of the current widget onto the stack, and then "returns" to this address just pushed onto the stack. http://en.wikipedia.org/wiki/Forth_(programming_language) http://en.wikipedia.org/wiki/PostScript (Forth-type stack languages themselves are considerably older than computers; they are a form of "combinators". http://en.wikipedia.org/wiki/Combinatory_logic ) In essence, this "return-oriented programming" (ROP) dispenses with the existing "call" instruction, and relies exclusively on the "return" instruction for all calls, returns and jumps. Unfortunately, return-oriented programming is not just a theoretical construct, but a very real mechanism by which modern computer viruses and worms can take over a machine. If there is even _one_ "buffer overflow" programming error in the code, a malicious programmer can construct a long string of bytes which encodes his/her malicious program in this "widget" stack machine, and causes it to start running when the subroutine receiving this long string attempts to return. At that point, this widget stack machine program starts executing, and performs enough computation to eventually constructs a system call which changes a page from writable to executable, and the machine is now completely compromised. Note that return-oriented programming might be stopped by making serious use of a "COMEFROM" (instead of GOTO) instruction. http://en.wikipedia.org/wiki/COMEFROM Alternately, it might be possible to use an analogy of the "XOR linked list" hack, only using instruction lists as linked lists, in order to defeat enetering code at the wrong places and/or from the wrong places. (Wasn't the list XOR hack part of original HAKMEM, or perhaps in KnuthV1 ?) http://en.wikipedia.org/wiki/XOR_linked_list http://en.wikipedia.org/wiki/Return-oriented_programming http://cseweb.ucsd.edu/~hovav/dist/sparc.pdf
Address space layout randomization effectively prevents this sort of attack on most(?) modern operating systems. See: http://en.wikipedia.org/wiki/Address_space_layout_randomization Biological entities have a harder time, although there are some instances of science fiction that posit defense mechanisms. Regards, Jon On Jan 26, 2014, at 10:52 AM, Henry Baker <hbaker1@pipeline.com> wrote:
Consider the plight of a virus in a biological system....
...To reduce the number of bits that the virus has to carry, the virus relies on intimate knowledge of the existing host's own DNA & RNA, so that it can "repurpose" various DNA & RNA subsequences for its own uses....
...Modern computer viruses can do the same thing that biological viruses can do: find _existing_ bit sequences & repurpose them for the virus's own computation....
...Unfortunately, return-oriented programming is not just a theoretical construct, but a very real mechanism by which modern computer viruses and worms can take over a machine....
Address space layout randomization helps, but only slows down the attacker. If the attacker can do "memory scraping", which is normally quite easy if the attacker can gain read access, then the attacker need only search memory to find all of his widgets, and then use a "relocatable loader" for his malware. Apparently, the most recent attack on Target, Neiman-Marcus, etc., credit cards utilized "memory scraping". In any case, I found it very cool that hackers had inadvertently discovered the techniques of "threaded code interpreters" (utilized in Forth implementations) in order to implement "return-oriented programming". Protecting instruction memory from reading may help a lot, but this may require new memory layouts which separate read-only constants (which must live in "read-only" but not "executable" pages) from instructions (which must live in "execute-only" pages). Also, instructions could be tagged in such a way that only instructions with certain tags can be "jumped to", and only instructions with other tags can be "called". This would eliminate the ability to utilize subroutine "tails" for widgets. Unfortunately, even with these constraints, any sufficiently large set of subroutine libraries will still have a Turing-complete set of widgets. In particular, a Forth implementation will certainly contain such a set, since that's the way Forth is implemented already! It's clear that the only way to stop such "return-oriented programming" is to severely limit the access of most programs to most libraries. At 12:54 PM 1/26/2014, Jon Ziegler wrote:
Address space layout randomization effectively prevents this sort of attack on most(?) modern operating systems. See: http://en.wikipedia.org/wiki/Address_space_layout_randomization Biological entities have a harder time, although there are some instances of science fiction that posit defense mechanisms.
Regards, Jon
On Jan 26, 2014, at 10:52 AM, Henry Baker <hbaker1@pipeline.com> wrote:
Consider the plight of a virus in a biological system....
...To reduce the number of bits that the virus has to carry, the virus relies on intimate knowledge of the existing host's own DNA & RNA, so that it can "repurpose" various DNA & RNA subsequences for its own uses....
...Modern computer viruses can do the same thing that biological viruses can do: find _existing_ bit sequences & repurpose them for the virus's own computation....
...Unfortunately, return-oriented programming is not just a theoretical construct, but a very real mechanism by which modern computer viruses and worms can take over a machine....
A later entry in DEC's PDP6/10/20 series had a feature aimed at protecting proprietary software components. A jump to an execute-only page had to go to a special instruction, I think "JRST 1,". This let the proprietary software mark the allowed entry points. Rich ------ Quoting Henry Baker <hbaker1@pipeline.com>:
Address space layout randomization helps, but only slows down the attacker. If the attacker can do "memory scraping", which is normally quite easy if the attacker can gain read access, then the attacker need only search memory to find all of his widgets, and then use a "relocatable loader" for his malware.
Apparently, the most recent attack on Target, Neiman-Marcus, etc., credit cards utilized "memory scraping".
In any case, I found it very cool that hackers had inadvertently discovered the techniques of "threaded code interpreters" (utilized in Forth implementations) in order to implement "return-oriented programming".
Protecting instruction memory from reading may help a lot, but this may require new memory layouts which separate read-only constants (which must live in "read-only" but not "executable" pages) from instructions (which must live in "execute-only" pages).
Also, instructions could be tagged in such a way that only instructions with certain tags can be "jumped to", and only instructions with other tags can be "called". This would eliminate the ability to utilize subroutine "tails" for widgets.
Unfortunately, even with these constraints, any sufficiently large set of subroutine libraries will still have a Turing-complete set of widgets. In particular, a Forth implementation will certainly contain such a set, since that's the way Forth is implemented already!
It's clear that the only way to stop such "return-oriented programming" is to severely limit the access of most programs to most libraries.
At 12:54 PM 1/26/2014, Jon Ziegler wrote:
Address space layout randomization effectively prevents this sort of attack on most(?) modern operating systems. See: http://en.wikipedia.org/wiki/Address_space_layout_randomization Biological entities have a harder time, although there are some instances of science fiction that posit defense mechanisms.
Regards, Jon
On Jan 26, 2014, at 10:52 AM, Henry Baker <hbaker1@pipeline.com> wrote:
Consider the plight of a virus in a biological system....
...To reduce the number of bits that the virus has to carry, the virus relies on intimate knowledge of the existing host's own DNA & RNA, so that it can "repurpose" various DNA & RNA subsequences for its own uses....
...Modern computer viruses can do the same thing that biological viruses can do: find _existing_ bit sequences & repurpose them for the virus's own computation....
...Unfortunately, return-oriented programming is not just a theoretical construct, but a very real mechanism by which modern computer viruses and worms can take over a machine....
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (3)
-
Henry Baker -
Jon Ziegler -
rcs@xmission.com