Re: [math-fun] Origin of SQRT hack?
Ok, here's the _extremely short_ version, which I'll expand upon later: On a "hidden bit" floating point system, the positive number 2^e(1+m) (0<m<1) is represented by the pair [e,m]. If we have a number z=2^(2e)*(1+2x) (0<x<.5), represented by [2e,2x], then a good initial approximation to sqrt(z) is [e,x]. Why? Consider squaring s0=[e,x]=2^e*(1+x): [e,x]^2 = (2^e*(1+x))^2 = 2^(2e)*(1+2*x+x^2) ~~ 2^(2e)*(1+2*x) = [2e,2x]. So you get the idea. So, the initial guess isn't a lot of precision, but every little "bit" helps... (I don't have time to expand upon this right this second, but the oddp(e) case looks more like 2^(e-1)*(1-m), so the low bit shifted out of the exponent becomes the "sign" bit for the mantissa.) At 02:30 PM 4/17/2011, Robert Munafo wrote:
On Sun, Apr 17, 2011 at 11:59, Henry Baker <<mailto:hbaker1@pipeline.com>hbaker1@pipeline.com> wrote:
I think we've established that shifting the exponent right by 1 for the initial approximation goes way far back. I suspect that it may go back much further -- perhaps to Von Neumann's original binary floating point proposal.
The only question now is what are the contents of the mantissa. I may need to do some experiments with Maxima to determine how important various guesses at the mantissa are.
[...]
No, you asked "[...] the bits of the entire number (both exponent & mantissa) shifted right by 1."
I still want the answer to that question, not any of the others.
I wrote assembly code for the PDP-11, and apparently it used a floating point format in which an integer test "A>B" was suitable for comparing two positive floating point numbers, so the monolithic shift would have worked there. Maybe other PDP's were similar. But we're looking for actual evidence of it.
- Robert
-- Robert Munafo -- <http://mrob.com>mrob.com Follow me at: <http://fb.com/mrob27>fb.com/mrob27 - <http://twitter.com/mrob_27>twitter.com/mrob_27 - <http://mrob27.wordpress.com>mrob27.wordpress.com - <http://youtube.com/user/mrob143>youtube.com/user/mrob143 - <http://rilybot.blogspot.com>rilybot.blogspot.com
I should point out the "denormalized" numbers completely blow out the "SQRT hack", regardless of the FP format. The function asinh(x) acts like exp(x)/2 for large x, and like x for small x -- i.e., like a "normalized" FP number for large x (albeit with a radix of e), and like a "denormalized" number for small x. So asinh(x) is the "smoothest" floating point format that includes denormalized numbers. So I tested Newton given a first approximation sinh(asinh(N)/2) to sqrt(N). In order to compute a full precision (in GNU Common Lisp) IEEE sqrt for N in the range 2^1 up to 2^512, exactly _five_ Newton iterations sufficed. However, when N started falling in the "denomalized" range -- i.e., N<1, the number of Newton iterations skyrocketed, basically requiring N Iters reqd 2^0 6 2^-10 11 2^-20 16 2^-30 21 2^-40 26 2^-50 31 etc. So, if you care about the square roots of denormalized numbers, you will have to treat them as a special case.
Minor typo ... sinh(x) ~ (e^x)/2 for large x; and asinh(x) ~ log(2x) for large x. ---Rich -----Original Message----- From: math-fun-bounces@mailman.xmission.com [mailto:math-fun-bounces@mailman.xmission.com] On Behalf Of Henry Baker Sent: Monday, April 18, 2011 9:27 AM To: math-fun Subject: Re: [math-fun] Origin of SQRT hack? I should point out the "denormalized" numbers completely blow out the "SQRT hack", regardless of the FP format. The function asinh(x) acts like exp(x)/2 for large x, and like x for small x -- i.e., like a "normalized" FP number for large x (albeit with a radix of e), and like a "denormalized" number for small x. So asinh(x) is the "smoothest" floating point format that includes denormalized numbers. So I tested Newton given a first approximation sinh(asinh(N)/2) to sqrt(N). In order to compute a full precision (in GNU Common Lisp) IEEE sqrt for N in the range 2^1 up to 2^512, exactly _five_ Newton iterations sufficed. However, when N started falling in the "denomalized" range -- i.e., N<1, the number of Newton iterations skyrocketed, basically requiring N Iters reqd 2^0 6 2^-10 11 2^-20 16 2^-30 21 2^-40 26 2^-50 31 etc. So, if you care about the square roots of denormalized numbers, you will have to treat them as a special case. _______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Here's the longer version of the interpretation of the SQRT hack. A positive floating point number in a "hidden bit" system with 8 bits of exponent e and 23 bits of mantissa m is represented by a pair [e,m], which is _usually_ interpreted to mean the number 2^e*(1+m), 0<=m<1. (We ignore the exponent bias, for now.) While that is the usual interpretation, we will instead use a _base-4_ interpretation for our SQRT hack: the high-order _seven_ bits of the exponent becomes our base-4 exponent, and the low order bit of the exponent now becomes the high order bit of the mantissa. The reason for this is that the mantissa is now considered as a 2's complement integer, with the _low order bit of the (binary) exponent becoming the "sign" bit for the mantissa_ ! [2*e'+0,+m] = 4^e'*(1+m) e' integer, 0<=m<1 [2*e'+1,-n] = 4^(e'+1)*(1-n) e' integer, 0<n<=0.5 Here are the representations & interpretations for the first several integers in floating point format: 1 = [00,.0] = [2*0+0,+.0] = 4^0*(1+.0) = 1*1.0 2 = [01,.0] = [2*0+1,-.10] = 4^1*(1-.10) = 4*0.5 3 = [01,.1] = [2*0+1,-.01] = 4^1*(1-.01) = 4*0.75 4 = [10,.00] = [2*1+0,+.00] = 4^1*(1+.00) = 4*1.0 5 = [10,.01] = [2*1+0,+.01] = 4^1*(1+.01) = 4*1.25 6 = [10,.10] = [2*1+0,+.10] = 4^1*(1+.10) = 4*1.5 7 = [10,.11] = [2*1+0,+.11] = 4^1*(1+.11) = 4*1.75 8 = [101,.000] = [2*1+1,-.1000] = 4^2*(1-.1000) = 16*0.5 9 = [101,.001] = [2*1+1,-.0111] = 4^2*(1-.0111) = 16*0.5625 10 = [101,.010] = [2*1+1,-.0110] = 4^2*(1-.0110) = 16*0.625 11 = [101,.011] = [2*1+1,-.0101] = 4^2*(1-.0101) = 16*0.6875 12 = [101,.100] = [2*1+1,-.0100] = 4^2*(1-.0100) = 16*0.75 13 = [101,.101] = [2*1+1,-.0011] = 4^2*(1-.0011) = 16*0.8125 14 = [101,.110] = [2*1+1,-.0010] = 4^2*(1-.0010) = 16*0.875 15 = [101,.111] = [2*1+1,-.0001] = 4^2*(1-.0001) = 16*0.9375 ... Let us now shift the entire word right by one bit, which shifts the low order bit of the original exponent into the high order bit of the mantissa. This accomplishes an "arithmetic" right shift of the mantissa, since the "sign" bit of the mantissa is propagated. Thus, sqrt([2*e'+0,+m]) ~ [e',+m/2] = 2^e'*(1+m/2) sqrt([2*e'+1,-n]) ~ [e',-n/2] = 2^e'*(1-n/2) In both cases, we have the first two terms of the Taylor series approximation to sqrt(1+x) = 1+x/2-x^2/8+x^3/16-... We must now deal with the exponent "bias". We usually don't store the exponent e itself, but store (e+bias), so that 0<=e+bias<2^8. But if we shift (e+bias) to the right by 1, we compute (e+bias)/2=e/2+bias/2. This means that we need to add bias/2 back in order to get e/2+bias/2+bias/2=e/2+bias. This also works when bias is odd, since we are working with a binary number system in which bias/2 can be represented exactly. (Note that since all we are doing is a right shift of the entire floating point number, the process of adding back the bias "commutes" with the process of the shift, assuming that if we add back the bias ahead of the shift, we add back "bias" rather than "bias/2".)
Oops, should be: 8 = [11,.000] = [2*1+1,-.1000] = 4^2*(1-.1000) = 16*0.5 9 = [11,.001] = [2*1+1,-.0111] = 4^2*(1-.0111) = 16*0.5625 10 = [11,.010] = [2*1+1,-.0110] = 4^2*(1-.0110) = 16*0.625 11 = [11,.011] = [2*1+1,-.0101] = 4^2*(1-.0101) = 16*0.6875 12 = [11,.100] = [2*1+1,-.0100] = 4^2*(1-.0100) = 16*0.75 13 = [11,.101] = [2*1+1,-.0011] = 4^2*(1-.0011) = 16*0.8125 14 = [11,.110] = [2*1+1,-.0010] = 4^2*(1-.0010) = 16*0.875 15 = [11,.111] = [2*1+1,-.0001] = 4^2*(1-.0001) = 16*0.9375 At 09:03 AM 4/19/2011, Henry Baker wrote:
Here's the longer version of the interpretation of the SQRT hack.
A positive floating point number in a "hidden bit" system with 8 bits of exponent e and 23 bits of mantissa m is represented by a pair [e,m], which is _usually_ interpreted to mean the number 2^e*(1+m), 0<=m<1. (We ignore the exponent bias, for now.)
While that is the usual interpretation, we will instead use a _base-4_ interpretation for our SQRT hack: the high-order _seven_ bits of the exponent becomes our base-4 exponent, and the low order bit of the exponent now becomes the high order bit of the mantissa. The reason for this is that the mantissa is now considered as a 2's complement integer, with the _low order bit of the (binary) exponent becoming the "sign" bit for the mantissa_ !
[2*e'+0,+m] = 4^e'*(1+m) e' integer, 0<=m<1 [2*e'+1,-n] = 4^(e'+1)*(1-n) e' integer, 0<n<=0.5
Here are the representations & interpretations for the first several integers in floating point format:
1 = [00,.0] = [2*0+0,+.0] = 4^0*(1+.0) = 1*1.0 2 = [01,.0] = [2*0+1,-.10] = 4^1*(1-.10) = 4*0.5 3 = [01,.1] = [2*0+1,-.01] = 4^1*(1-.01) = 4*0.75 4 = [10,.00] = [2*1+0,+.00] = 4^1*(1+.00) = 4*1.0 5 = [10,.01] = [2*1+0,+.01] = 4^1*(1+.01) = 4*1.25 6 = [10,.10] = [2*1+0,+.10] = 4^1*(1+.10) = 4*1.5 7 = [10,.11] = [2*1+0,+.11] = 4^1*(1+.11) = 4*1.75 8 = [101,.000] = [2*1+1,-.1000] = 4^2*(1-.1000) = 16*0.5 9 = [101,.001] = [2*1+1,-.0111] = 4^2*(1-.0111) = 16*0.5625 10 = [101,.010] = [2*1+1,-.0110] = 4^2*(1-.0110) = 16*0.625 11 = [101,.011] = [2*1+1,-.0101] = 4^2*(1-.0101) = 16*0.6875 12 = [101,.100] = [2*1+1,-.0100] = 4^2*(1-.0100) = 16*0.75 13 = [101,.101] = [2*1+1,-.0011] = 4^2*(1-.0011) = 16*0.8125 14 = [101,.110] = [2*1+1,-.0010] = 4^2*(1-.0010) = 16*0.875 15 = [101,.111] = [2*1+1,-.0001] = 4^2*(1-.0001) = 16*0.9375 ...
Let us now shift the entire word right by one bit, which shifts the low order bit of the original exponent into the high order bit of the mantissa. This accomplishes an "arithmetic" right shift of the mantissa, since the "sign" bit of the mantissa is propagated.
Thus,
sqrt([2*e'+0,+m]) ~ [e',+m/2] = 2^e'*(1+m/2) sqrt([2*e'+1,-n]) ~ [e',-n/2] = 2^e'*(1-n/2)
In both cases, we have the first two terms of the Taylor series approximation to sqrt(1+x) = 1+x/2-x^2/8+x^3/16-...
We must now deal with the exponent "bias". We usually don't store the exponent e itself, but store (e+bias), so that 0<=e+bias<2^8. But if we shift (e+bias) to the right by 1, we compute (e+bias)/2=e/2+bias/2. This means that we need to add bias/2 back in order to get e/2+bias/2+bias/2=e/2+bias. This also works when bias is odd, since we are working with a binary number system in which bias/2 can be represented exactly.
(Note that since all we are doing is a right shift of the entire floating point number, the process of adding back the bias "commutes" with the process of the shift, assuming that if we add back the bias ahead of the shift, we add back "bias" rather than "bias/2".)
participants (2)
-
Henry Baker -
Schroeppel, Richard