Re: [math-fun] What kind of structure are IEEE 754 doubles?
At 04:23 PM 4/14/2015, Mike Stay wrote:
Is addition commutative?
Probably not, depending upon which computer you use. Leaving aside issues of NaN's, signaling and exponent issues, some computers have larger mantissas for "accumulation", so that very long iterated sums still "work". E.g., Intel x86 "extended precision" floats with 64-bit mantissa. http://en.wikipedia.org/wiki/Extended_precision Since you're now adding numbers of two different mantissa sizes, they are highly unlikely to be commutative. "satanism" is an anagram for "mantissa"; now do you see why floating point is so hard? http://en.wiktionary.org/wiki/mantissa
That's a very good point. I have experienced that problem before, where seemingly irrelevant changes resulted in slight differences in floating point results. Eventually, I decided that determinism was more important than a little unpredictable extra precision, so I started compiling that particular app with the following gcc option: -ffloat-store Do not store floating-point variables in registers, and inhibit other options that might change whether a floating-point value is taken from a register or memory. This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables. That solved the problem, but I found it unsettling. Floating point can be extremely annoying. Tom Henry Baker writes:
At 04:23 PM 4/14/2015, Mike Stay wrote:
Is addition commutative?
Probably not, depending upon which computer you use.
Leaving aside issues of NaN's, signaling and exponent issues, some computers have larger mantissas for "accumulation", so that very long iterated sums still "work". E.g., Intel x86 "extended precision" floats with 64-bit mantissa.
http://en.wikipedia.org/wiki/Extended_precision
Since you're now adding numbers of two different mantissa sizes, they are highly unlikely to be commutative.
"satanism" is an anagram for "mantissa"; now do you see why floating point is so hard?
http://en.wiktionary.org/wiki/mantissa
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
"Pure" IEEE floating point addition is most emphatically commutative. Same for multiplication. What I mean is the fundamental operation as a manipulation of 64-tuples of bits. It is also completely deterministic, notwithstanding the huge amount of mistaken folklore to the contrary. The algorithm is simple to describe: compute the *infinitely precise* result, and round (binary bankers' rounding!) to 64 bits. Simplicity itself! There are also hairy rules about denorms and such. But it's still deterministic. You are designing hardware and you find that specification difficult to support? Get out of the kitchen. All modern (that is, last many years) hardware does this. Now, hardware often supports a more precise significand--64 bits instead of 53, for internal reasons, and to allow "hidden" operations to make library functions more precise. These operations are often called "multiply-accumulate" or "fused multiply add". The actual registers are typically 80 bits. Nothing wrong with that. Compilers often "help" to make your code "more accurate" by using these operations, so that intermediate results of a long computation are stored in registers, at 80 bits. Plenty wrong with that. It makes your code nondeterministic. The method outlined by Tom Karzes will fix that, by storing all intermediate results in (64 bit!) memory words. I have given talks on IEEE floating point, and how it is deterministic and not satanic at all, many times. I sometime refer to the two camps as "fundamentalists" and "secular humanists". I am blue in the face. On Wed, Apr 15, 2015 at 1:20 PM, Tom Karzes <karzes@sonic.net> wrote:
That's a very good point. I have experienced that problem before, where seemingly irrelevant changes resulted in slight differences in floating point results. Eventually, I decided that determinism was more important than a little unpredictable extra precision, so I started compiling that particular app with the following gcc option:
-ffloat-store
Do not store floating-point variables in registers, and inhibit other options that might change whether a floating-point value is taken from a register or memory.
This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.
That solved the problem, but I found it unsettling. Floating point can be extremely annoying.
Tom
Henry Baker writes:
At 04:23 PM 4/14/2015, Mike Stay wrote:
Is addition commutative?
Probably not, depending upon which computer you use.
Leaving aside issues of NaN's, signaling and exponent issues, some computers have larger mantissas for "accumulation", so that very long iterated sums still "work". E.g., Intel x86 "extended precision" floats with 64-bit mantissa.
http://en.wikipedia.org/wiki/Extended_precision
Since you're now adding numbers of two different mantissa sizes, they are highly unlikely to be commutative.
"satanism" is an anagram for "mantissa"; now do you see why floating point is so hard?
http://en.wiktionary.org/wiki/mantissa
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Sounds like you know what's up with IEEE numbers. So maybe you can tell me: do they satisfy axioms like (((x divided by y) times y) divided by y) equals (x divided by y), for instance? What other axioms do they satisfy? Is there a finite basis for the identities that n-bit IEEE numbers satisfy for all n? In short, what do IEEE numbers look like through the lens of universal algebra? Jim Propp On Wednesday, April 15, 2015, William Ackerman <wba@alum.mit.edu> wrote: "Pure" IEEE floating point addition is most emphatically commutative. Same
for multiplication. What I mean is the fundamental operation as a manipulation of 64-tuples of bits. It is also completely deterministic, notwithstanding the huge amount of mistaken folklore to the contrary.
The algorithm is simple to describe: compute the *infinitely precise* result, and round (binary bankers' rounding!) to 64 bits. Simplicity itself! There are also hairy rules about denorms and such. But it's still deterministic. You are designing hardware and you find that specification difficult to support? Get out of the kitchen.
All modern (that is, last many years) hardware does this.
Now, hardware often supports a more precise significand--64 bits instead of 53, for internal reasons, and to allow "hidden" operations to make library functions more precise. These operations are often called "multiply-accumulate" or "fused multiply add". The actual registers are typically 80 bits. Nothing wrong with that. Compilers often "help" to make your code "more accurate" by using these operations, so that intermediate results of a long computation are stored in registers, at 80 bits. Plenty wrong with that. It makes your code nondeterministic. The method outlined by Tom Karzes will fix that, by storing all intermediate results in (64 bit!) memory words.
I have given talks on IEEE floating point, and how it is deterministic and not satanic at all, many times. I sometime refer to the two camps as "fundamentalists" and "secular humanists". I am blue in the face.
[older stuff deleted]
Knuth has a discussion of this in TAoCP, in the floating-point section. He recommends that implementers try to satisfy as many regular algebra rules as possible. This preceded the standardization of IEEE floating- point, (1975?). I haven't seen a coherent discussion of "What rules are possible?" while still keeping an economically sensible system. You can probably count on A >= B -> A+C >= B+C, ignoring NaNs & infinities. Can you build a safe Newton's Method square-root function that terminates, without too much extra baggage? Rich ------------------ Quoting James Propp <jamespropp@gmail.com>:
Sounds like you know what's up with IEEE numbers. So maybe you can tell me: do they satisfy axioms like (((x divided by y) times y) divided by y) equals (x divided by y), for instance? What other axioms do they satisfy? Is there a finite basis for the identities that n-bit IEEE numbers satisfy for all n?
In short, what do IEEE numbers look like through the lens of universal algebra?
Jim Propp
On Wednesday, April 15, 2015, William Ackerman <wba@alum.mit.edu> wrote:
"Pure" IEEE floating point addition is most emphatically commutative. Same
for multiplication. What I mean is the fundamental operation as a manipulation of 64-tuples of bits. It is also completely deterministic, notwithstanding the huge amount of mistaken folklore to the contrary.
The algorithm is simple to describe: compute the *infinitely precise* result, and round (binary bankers' rounding!) to 64 bits. Simplicity itself! There are also hairy rules about denorms and such. But it's still deterministic. You are designing hardware and you find that specification difficult to support? Get out of the kitchen.
All modern (that is, last many years) hardware does this.
Now, hardware often supports a more precise significand--64 bits instead of 53, for internal reasons, and to allow "hidden" operations to make library functions more precise. These operations are often called "multiply-accumulate" or "fused multiply add". The actual registers are typically 80 bits. Nothing wrong with that. Compilers often "help" to make your code "more accurate" by using these operations, so that intermediate results of a long computation are stored in registers, at 80 bits. Plenty wrong with that. It makes your code nondeterministic. The method outlined by Tom Karzes will fix that, by storing all intermediate results in (64 bit!) memory words.
I have given talks on IEEE floating point, and how it is deterministic and not satanic at all, many times. I sometime refer to the two camps as "fundamentalists" and "secular humanists". I am blue in the face.
[older stuff deleted]
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (5)
-
Henry Baker -
James Propp -
rcs@xmission.com -
Tom Karzes -
William Ackerman