One has to keep in mind the _context_ of these various codes. The fast invsqrt hack was intended to _approximate_ the invsqrt very quickly without worrying about the last few bits of precision. Most library routines have a higher goal -- attempting to squeeze out every bit of precision at the expense of some speed. This is why the sqrt/invsqrt from the _vision/graphics_ library at MIT or Stanford or CMU or Utah would be more interesting: these customers of code were more interested in raw speed than in the last few bits of precision. It is also important to keep in mind that the tradeoffs of different machines are different. In today's "RISC" architectures, most operations attempt to complete within one cycle, but they still "stall" on a load that misses in the cache. So on modern architectures, _table lookups_ can be extremely expensive, unless you have good reason to believe that the table remains in the cache. In the architectures of the 60's & 70's, there could be great differences between the execution times of different instructions. On some machines, even a simple _shift_ could take time proportional to the amount of the shift. On some machines, a multiply might take 30 times longer than an ADD or an AND. So without having a timing card handy when analyzing the code, it would be difficult to assess whether it is faster to shift an entire word, or to pick it apart. In particular, given the slow speed on some machines of FP operations, one would be willing to spend quite a number of fixed point operations to save one Newton iteration. The acid test would be whether someone _today_, given all of the accumulated knowledge, could program one of these machines of yesterday to produce a faster sqrt/invsqrt than was done in 1970 (say). At 04:21 AM 4/17/2011, Robert Munafo wrote:
Henry,
Even without a list of PDP-10 opcodes, this clearly doesn't fulfill your original question because it is far too many instructions. Also, it seems to address a data (constant) area differently based on the low bit of the (pre-shifted) exponent.
The whole point of your original question is that everything is done by shifting the exponent and mantissa together as a monolithic object. You wrote:
When making an initial guess of sqrt(x) for a Newton iteration, where x is a positive floating point number, a good initial guess is the bits of the entire number (both exponent & mantissa) shifted right by 1. [...]
Given the "0x5f3759df" a.k.a. "Fast InvSqrt()" example, it makes sense to allow one constant to both correct the exponent offset and bias the range of the linear function (to bring its average closer to the parabola), but this code has 4 constants!
Similar comments would apply to that other PDP examples too: way too many constants.
Relevant parts with Fortran ".LT." changed to < for clarity:
http://pdp-10.trailing-edge.com/decuslib10-10/01/43,50516/basxct.mac.html ;VERSION 17E 2-OCT-74/NA [...] ;VERSION 13 15-SEP-1969 [...] ;SQRT(F) IS CALCULATED BY N LINEAR APPROXIMATION, THE NATURE ;OF WHICH DEPENDS ON WHETHER 1/4 < F < 1/2 OR 1/2 < F < 1, ;FOLLOWED BY TWO ITERATIONS OF NEWTON'S METHOD. [...] SQRTB0: ASHC T, -33 ;PUT EXPONENT IN T, FRACTION IN T1 SUBI T, 201 ;SUBTRACT 201 FROM EXPONENT ROT T, -1 ;CUT EXP IN HALF, SAVE ODD BIT HRRM T,EX1 ;SAVE FOR FUTURE SCALING OF ANS ;IN FSC N,. INSTRUCTION LSH T, -43 ;GET BIT SAVED BY PREVIOUS INST. ASH T1, -10 ;PUT FRACTION IN PROPER POSITION FSC T1, 177(T) ;PUT EXPONENT OF FRACT TO -1 OR 0 MOVEM T1, N ;SAVE IT. 1/4 < F < 1 FMP T1, SQCON1(T) ;LINEAR FIRST APPROX,DEPENDS ON FAD T1, SQCON2(T) ;WHETHER 1/4 < F < 1/2 OR 1/2 < F < 1. MOVE T, N ;START NEWTONS METHOD WITH FRAC FDV T, T1 ;CALCULATE X(0)/X(1) FAD T1, T ;X(1) + X(0)/X(1) FSC T1, -1 ;1/2(X(1) + X(0)/X(1)) FDV N, T1 ;X(0)/X(2) FADR N, T1 ;X(2) + X(0)/X(2) XCT EX1 SQRT1: POPJ P, ;EXIT SQCON1: 0.8125 ;CONSTANT, USED IF 1/4 < FRAC < 1/2 0.578125 ;CONSTANT, USED IF 1/2 < FRAC < 1 SQCON2: 0.302734 ;CONSTANT, USED IF 1/4 < FRAC < 1/2 0.421875 ;CONSTANT, USED IF 1/2 < FRAC < 1
-- Robert Munafo -- mrob.com Follow me at: fb.com/mrob27 - twitter.com/mrob_27 - mrob27.wordpress.com- youtube.com/user/mrob143 - rilybot.blogspot.com