[math-fun] posits, IEEE floats, asinh numbers
Over the years, I've bored you all to tears talking about asinh numbers. Basically, asinh numbers represent a real number x as the integer: round(alpha*asinh(beta*x)) For example, if alpha=1/asinh(beta), then the asinh # for 1 is also 1. Since asinh(x) is an odd function, we get smooth transitions about 0. This means we can capture a brief segment of the integers within this representation, while also getting a logarithmic representation of numbers with very large absolute values. In particular, if beta ~ 2^(-22) and we have a 32-bit asinh # representation, then all of the signed 16-bit integers are mapped 1-1, but the overall range is ~ +-10^228. If beta ~ 2^(-46) and we have a 64-bit asinh # representation, then all of the signed 32-bit integers are mapped 1-1, but the overall range is ~ +-10^56937. asinh numbers are thus the *smooth* integration of Kahan IEEE "denormalized" floats with normalized floats. asinh numbers don't have a separate exponent and mantissa, and therefore don't require a "regime" number to specify the size of the exponent. Of course, we can adjust alpha & beta so that asinh #'s have the same denorm range as IEEE denorms. I claim that asinh numbers provide a much smoother (and therefore more accurate) set of numbers than Gustafson's "posits". If I have any time, I'll try to re-do some of Gustafson's calculations with asinh numbers.
On 26/03/2017 17:43, Henry Baker wrote:
Basically, asinh numbers represent a real number x as the integer:
round(alpha*asinh(beta*x)) ... This means we can capture a brief segment of the integers within this representation, while also getting a logarithmic representation of numbers with very large absolute values.
In particular, if beta ~ 2^(-22) and we have a 32-bit asinh # representation, then all of the signed 16-bit integers are mapped 1-1, but the overall range is ~ +-10^228.
This has the drawback that essentially no integers other than 0 and +-1 are *exactly* represented, unless I'm missing something. I'm not sure how much that matters, but it would be quite a departure from conventional floating-point representations. -- g
At 01:02 PM 3/26/2017, Gareth McCaughan wrote: This has the drawback that essentially no integers other than
0 and +-1 are *exactly* represented, unless I'm missing something. I'm not sure how much that matters, but it would be quite a departure from conventional floating-point representations.
I'm not so sure that anyone cares about this anymore. I may have been one of the few people trying to do integer calculations using a floating point unit: January 1992 http://home.pipeline.com/~hbaker1/AB-mod-N.html http://home.pipeline.com/~hbaker1/AB-mod-N.pdf "We show how to compute A*B (mod N) efficiently, for single-precision A,B, and N, on a modern RISC architecture (Intel 80860) in ANSI C. On this architecture, our method computes A*B (mod N) faster than ANSI C computes A%N, for unsigned longs A and N."
On 27/03/2017 02:07, Henry Baker wrote:
At 01:02 PM 3/26/2017, Gareth McCaughan wrote: This has the drawback that essentially no integers other than
0 and +-1 are *exactly* represented, unless I'm missing something. I'm not sure how much that matters, but it would be quite a departure from conventional floating-point representations.
I'm not so sure that anyone cares about this anymore.
Why not? One reason why people might care *more* about it these days, outside the context of serious numerical work: more and more of the world runs on Javascript, and Javascript's number semantics are basically "everything is an IEEE double". -- g
If it's true that integers aren't exactly represented, it could be a big problem. I think it has become common to "know" that a double float has more bits of precision than a single int, and represents it exactly. With IEEE doubles, so there is no harm in promoting int to double and later doing comparisons, even for equality. The pitfall with posits is that it might. possibly (posit)(x+y) != (posit)x+(posit)y; whereas with IEEE doubles, (double)(x+y) == (double)x+(double)y; thoughtful coders won't write this on purpose, but it's hard to avoid using what you know.
(double)(x+y) == (double)x+(double)y; is false for some values of x and y as long as x and y are not already doubles (and even if they are if any are or result in NaN). On 27-Mar-17 14:18, Dave Dyer wrote:
If it's true that integers aren't exactly represented, it could be a big problem. I think it has become common to "know" that a double float has more bits of precision than a single int, and represents it exactly. With IEEE doubles, so there is no harm in promoting int to double and later doing comparisons, even for equality.
The pitfall with posits is that it might.
possibly
(posit)(x+y) != (posit)x+(posit)y;
whereas with IEEE doubles,
(double)(x+y) == (double)x+(double)y;
thoughtful coders won't write this on purpose, but it's hard to avoid using what you know.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
At 11:30 AM 3/27/2017, Mike Speciner wrote:
(double)(x+y) == (double)x+(double)y;
is false for some values of x and y as long as x and y are not already doubles (and even if they are if any are or result in NaN).
Isn't it true if x and y are 32 bit ints? You'll never exceed the number of fraction bits of a double. The point is, if you have a calculation that is being gradually promoted from ints to doubles, you don't need to worry about the mixed parts of the calculation. You can just let the compiler do what it will and convert the ints when and where it feels like. It's the reason languages like javascript, which don't even have an integer type, get away with it.
No, e.g., int32 x = -1<<31, y = -1 In that case x+y == -1 is definitely not equal to (double)x+(double)y On 27-Mar-17 14:42, Dave Dyer wrote:
At 11:30 AM 3/27/2017, Mike Speciner wrote:
(double)(x+y) == (double)x+(double)y;
is false for some values of x and y as long as x and y are not already doubles (and even if they are if any are or result in NaN). Isn't it true if x and y are 32 bit ints? You'll never exceed the number of fraction bits of a double.
The point is, if you have a calculation that is being gradually promoted from ints to doubles, you don't need to worry about the mixed parts of the calculation. You can just let the compiler do what it will and convert the ints when and where it feels like.
It's the reason languages like javascript, which don't even have an integer type, get away with it.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Sorry, in that case, x+y = (1<<31)-1 Perhaps I should have used the example x = y = -1<<31, where x+y=0. On 27-Mar-17 14:50, Mike Speciner wrote:
No, e.g., int32 x = -1<<31, y = -1 In that case x+y == -1 is definitely not equal to (double)x+(double)y
On 27-Mar-17 14:42, Dave Dyer wrote:
At 11:30 AM 3/27/2017, Mike Speciner wrote:
(double)(x+y) == (double)x+(double)y;
is false for some values of x and y as long as x and y are not already doubles (and even if they are if any are or result in NaN). Isn't it true if x and y are 32 bit ints? You'll never exceed the number of fraction bits of a double.
The point is, if you have a calculation that is being gradually promoted from ints to doubles, you don't need to worry about the mixed parts of the calculation. You can just let the compiler do what it will and convert the ints when and where it feels like.
It's the reason languages like javascript, which don't even have an integer type, get away with it.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
At 11:50 AM 3/27/2017, Mike Speciner wrote:
No, e.g., int32 x = -1<<31, y = -1 In that case x+y == -1 is definitely not equal to (double)x+(double)y
That's not well behaved in the integer domain either - you ought to get a range exception, but of course you never do. In cases where the pure integer arithmetic is well behaved, is there any case where the same calculation in IEEE doubles gives a different result?
You're correct if the integer addition produces "mathematically correct" results and we're talking about 32-bit integers and 64-bit IEEE floats. And the point about javascript is right on. I agree that asinh numbers (particularly when scaled so that they handle denormalized floats) would be problematic for integers. I do like the idea of using asinh in place of log for mapping axes on some graphs/plots. Is there a write-up of posits that actually specifies how the regime bits work--I couldn't really tell from the Gustafson slides. Is there a parameter besides the total number of bits that needs to be specified? --ms On 27-Mar-17 15:06, Dave Dyer wrote:
At 11:50 AM 3/27/2017, Mike Speciner wrote:
No, e.g., int32 x = -1<<31, y = -1 In that case x+y == -1 is definitely not equal to (double)x+(double)y That's not well behaved in the integer domain either - you ought to get a range exception, but of course you never do.
In cases where the pure integer arithmetic is well behaved, is there any case where the same calculation in IEEE doubles gives a different result?
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Since we are talking about this, and since it is so much fun, here's one ramification of IEEE 754 doubles, in Groovy: groovy:000> a = new Double(-1) ===> -1.0 groovy:000> b = new Double(0) ===> 0.0 groovy:000> c = new Double(1) ===> 1.0 groovy:000> a * b == b * c ===> false Clearly this is a bug in Groovy, but it's a direct result of the use of IEEE-754. -tom On Mon, Mar 27, 2017 at 11:18 AM, Dave Dyer <ddyer@real-me.net> wrote:
If it's true that integers aren't exactly represented, it could be a big problem. I think it has become common to "know" that a double float has more bits of precision than a single int, and represents it exactly. With IEEE doubles, so there is no harm in promoting int to double and later doing comparisons, even for equality.
The pitfall with posits is that it might.
possibly
(posit)(x+y) != (posit)x+(posit)y;
whereas with IEEE doubles,
(double)(x+y) == (double)x+(double)y;
thoughtful coders won't write this on purpose, but it's hard to avoid using what you know.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- -- http://cube20.org/ -- http://golly.sf.net/ --
participants (5)
-
Dave Dyer -
Gareth McCaughan -
Henry Baker -
Mike Speciner -
Tomas Rokicki