Re: [math-fun] General way to avoid mispredicted branches (in sorting, & probably many other tasks)

20 May 2014

      Warren:

You probably already know this, but deeply pipelined units --
e.g., GPU's -- sometimes use _sorting networks_ which are
more-or-less oblivious compare-and-swap straight-line
instructions.  Since there isn't any branching, there's no
branch prediction.

In graphics operations, there is often a need to sort stuff,
so this is important.

However, such oblivious sorting networks can cost an addition
factor of logN, at least in terms of hardware, but this may
acceptable in today's environment.

Branch prediction reached absurd levels prior to modern
multicore CPU's, and the emphasis is now away from heavy
branch prediction in favor of more cores.

Branch prediction basically builds several hypothetical
states in parallel (requiring lots of additional registers),
and only when the results of the decisions are finally
available does the CPU choose the correct state to
"instantiate" as the actual state.

In this sense, a branch prediction CPU is very much like
a two-phase database transaction, where a branch opens
a new transaction, and a number of uncommitted results
are computed, and when the branch decision becomes
evident, the choice of the actual state becomes the
"commit" part of the transaction.

At 04:58 PM 5/20/2014, Warren D Smith wrote:
...
Suppose you are doing quicksort.
Zoing, zoing, zoing, you do compares, and suddenly you decide
"aha! this last comparison tells me I need to swap item[K]!"
And then you branch off someplace else in your program to do it.
Big lose.  Mispredicted branch.
But instead:
When your comparison sees a bad item, you conditional-load that item's
ID number into a stash (no branching needed) and continue on.
Eventually your stash gets full and you have to do a real branch to
deal with that,
but  with a large enough stash-size, this can be made arbitrarily rare.
 When the stash fills up (or reach end), we do all the swaps of
stashed bad items (emptying the stash),  then branch back to main
code.
This stashing technique of "deferring" all the branches, then you later do
all the stuff those branches were going to do -- but almost without
any branching --
presumably is of pretty general applicability as a high performance
programming technique.   Certainly it is applicable to quicksort. I
don't know what other problems
it is applicable to, but probably many.   This should yield
substantial speedups for the right problems.  (Any suggestions?)
Meanwhile, in preliminary work, I invented a different new way to
substantially speed up quicksort, which finally beat my heapsort
implementation (which in turn beat both the
currently most-ballyhooed quicksort codes, by Bentley+McIlroy, and by
Kushagra et al, in my speed tests).   Some version of this stashing
idea now might increase that lead, but haven't tried it yet.   Anyhow,
at least in principle, quicksort now has excellent caching behavior,
excellent branch misprediction rates, and arbitrarily near-optimal
comparison counts (with good pivot choices) all in one algorithm.
Suckily large move counts, though.
-- 
Warren D. Smith
http://RangeVoting.org  <-- add your endorsement (by clicking
"endorse" as 1st step)

Re: [math-fun] General way to avoid mispredicted branches (in sorting, & probably many other tasks)

Henry Baker