[math-fun] Arbitrary bit permuting in O(log(wordsize)) steps; & demonstration this is optimal