At 01:16 PM 3/26/2017, David Wilson wrote:
Unless I am missing some magic simplification or algorithm, asinh addition and multiplication require at least an evaluation of a power series, which seems much more costly than a floating-point addition or multiplication.
Chip HW cost & chip HW speed are more dependent upon bus sizes and speeds than functional units. You can have immensely complicated functional units, but they don't require much more power or area than relatively simply functional units *of the same word size*. Thus, even complex functional units that have to do case analysis on 50-100 different cases can still be quite doable, so long as you have tools to mechanically design & check all of these cases. I admit that the standard unary & binary ops for asinh arithmetic might be substantially more complicated than for IEEE arithmetic, but once you've designed it, and if you can deeply pipeline it (e.g., for GPU's), you can get phenomenal throughput.