[math-fun] HAKMEM's missing entry for "ROP"

26 Jan 2014

      "Tails You Lose"

What I'm going to describe is about 10 years old, but may be
considerably older (please let me know!).

Consider the plight of a virus in a biological system.

The virus has to make copies of itself, but it can't afford to carry
around the complete set of instruction bits necessary to describe to
the host organism how to do this.  In particular, the virus has to
tell the host organism how to make various proteins in various
sequences to force the host to build copies of the virus, and the
virus doesn't want to have to describe every single needed protein
within the virus's own DNA/RNA.

To reduce the number of bits that the virus has to carry, the virus
relies on intimate knowledge of the existing host's own DNA & RNA, so
that it can "repurpose" various DNA & RNA subsequences for its own
uses.  In many cases, it won't even matter whether the subsequences
within the host's DNA & RNA are "aligned" properly or not; the virus
will only care that the subsequences it needs already exist within the
host's DNA & RNA.

The virus is thus performing a kind of "rsync" algorithm, which was
originally developed to save bandwidth during the transmission of
information from one computer to another by taking advantage of
subsequences of bits that already existed within the receiving
computer.  Rsynch only transmits the "instructions" for constructing
the intended sequence from bits and subsequence pieces already present
on the receiving computer.

http://en.wikipedia.org/wiki/Rsync

Thus, if most of the bits the virus already requires are already
present (perhaps in pieces) as subsequences of the host's DNA & RNA,
the virus only has to contain the missing bits, together with enough
information to assemble the complete sequences from these subsequence
bits and pieces.

It should be clear that the virus has traded off ease of transmission
for specificity of hosts, since only those hosts that have its
"missing" bits will be capable of reproducing the virus.

A similar situation exists for computer viruses today.  While memory
capacity and transmission bandwidth of today's computers are very
generous, there are severe limitations on the types of sequences that
are allowed to execute.  In particular, modern (and not-so-modern!)
computers typically have read/write/execute permissions on pages in
memory.  If a page is designated "rx" (read/execute), it can be
executed, while if a page is designated as "rw", it can be read and
written but not executed.

Modern computers (for a huge number of reasons), do not allow write
access to executable pages, and it is quite difficult to change the
permission status of a page from writable to executable.  Similarly,
(and for many of the same reasons) these computers do not allow
"instructions" on writable pages to be executed.

This page permission structure would seem to eliminate the possibility
of executing arbitrary tasks not already present in the non-writable
executable pages, but this is incorrect!

Modern computer viruses can do the same thing that biological viruses
can do: find _existing_ bit sequences & repurpose them for the virus's
own computation.  While these existing bit sequences may not be the
most efficient or elegant means to achieve the virus's ends, it
doesn't matter, because the virus's life & progeny depend upon these
possibly-clumsy mechanisms.

Step 1.  Find useful subsequences already extant on the host.  These
subsequences will be called "widgets".

In the case of computer programs, we need to find usable sequences of
bits already extant _within executable pages_ that can be used to
construct Turing-complete computation.  This is due to the limitation
that only instruction sequences on executable pages are allowed to be
executed.

Given the multi-gigabyte size and complexity of modern operating
systems, this is actually a quite easy task, because we're
approximating the "infinite monkey theorem", wherein enough
(Seattle-based) monkeys with computer consoles can construct every
sequence of instructions.

http://en.wikipedia.org/wiki/Infinite_monkey_theorem

For our usable widgets, we're going to be looking for sequences of
instructions that end in a "return" instruction, because we will need
this "return" instruction to link these subsequence widgets together
in the fashion of "threaded code interpreters".

http://en.wikipedia.org/wiki/Threaded_code

Some of the best sources of "return-rich" code sequences exist in
"shared libraries".  Because these are libraries, they tend to be
collections of subroutines, each of which terminates in a "return"
instruction, and since these are _shared_ they reside on read-only,
_executable_ memory pages.  Furthermore, since these are libraries,
they tend to be quite constant for many, many years, and hence can be
relied upon to contain constant sequences.

Note that _complete_ library subroutines are less interesting for the
virus's purposes, than are _tails_ of library subroutines.  The main
entry point of a subroutine expects arguments in standard locations,
but by entering this instruction sequence further down near the end,
the subroutine is more likely to be utilizing values in non-standard
locations (including memory locations).

Step 2.  Use a Turing-complete set of widgets as a new "instruction
set" for a "threaded code interpreter".

The typical widget consists of a number of instructions which
manipulate memory and/or registers, followed by a return instruction
which pops an address from the stack and jumps to this address to
resume execution.  A return instruction that returns to an address
stored in a register is just as useful.  Due to the restriction that
execution is only possible when instructions are fetched from
"executable" memory pages, our virus must guarantee that all such
"return" addresses fall within these executable memory pages.

There are several ways to implement if-then-else with these widgets.
One can adjust the stack pointer by a variable amount to achieve the
effect of a "skip" instruction.  One can also execute both arms of a
conditional and arrange to use only the result of the correct branch.

Thus, our virus fashions a Forth/Postscript-like stack machine out
of these widgets which loads arguments into registers or onto the
stack, loads the address of the next widget onto the stack, loads the
address of the current widget onto the stack, and then "returns" to
this address just pushed onto the stack.

http://en.wikipedia.org/wiki/Forth_(programming_language)

http://en.wikipedia.org/wiki/PostScript

(Forth-type stack languages themselves are considerably older than
computers; they are a form of "combinators".

http://en.wikipedia.org/wiki/Combinatory_logic )

In essence, this "return-oriented programming" (ROP) dispenses with
the existing "call" instruction, and relies exclusively on the
"return" instruction for all calls, returns and jumps.

Unfortunately, return-oriented programming is not just a theoretical
construct, but a very real mechanism by which modern computer viruses
and worms can take over a machine.  If there is even _one_ "buffer
overflow" programming error in the code, a malicious programmer can
construct a long string of bytes which encodes his/her malicious
program in this "widget" stack machine, and causes it to start running
when the subroutine receiving this long string attempts to return.  At
that point, this widget stack machine program starts executing, and
performs enough computation to eventually constructs a system call
which changes a page from writable to executable, and the machine is
now completely compromised.

Note that return-oriented programming might be stopped by making
serious use of a "COMEFROM" (instead of GOTO) instruction.

http://en.wikipedia.org/wiki/COMEFROM

Alternately, it might be possible to use an analogy of the "XOR linked
list" hack, only using instruction lists as linked lists, in order to
defeat enetering code at the wrong places and/or from the wrong places.

(Wasn't the list XOR hack part of original HAKMEM, or perhaps in
KnuthV1 ?)

http://en.wikipedia.org/wiki/XOR_linked_list

http://en.wikipedia.org/wiki/Return-oriented_programming

http://cseweb.ucsd.edu/~hovav/dist/sparc.pdf

Henry Baker

Jon Ziegler

Henry Baker

rcs＠xmission.com

tags

participants (3)