[math-fun] Why is e the "best" base? + A symmetrical way to treat digits 0,1,2 base e
I forget who mentioned that e was found to be the "best" number base. I would like to know in what sense it is best -- or is this just some kind of offhand, non-rigorous comment? I chose to look at e simply because it seems to be the math constant (other than 2) whose powers (not necessary integer) arise most commonly. There is the old chestnut that (modulo details) if we start with a real number K >> 0 and wish to break K up into the sum of positive pieces with maximum product, this is attained by making each piece = e. (This is exactly true if K = ne, but needs some modification otherwise, of course.) Is this relevant here? ------------------------------------------------------------------------------ ---------------------------------------- ALSO: There's something unappealing about the asymmetry inherent in using the greedy algorithm to represent a number to (say) base e, since 0 and 1 play a role different from the other digit, 2. One way to avoid this would be as follows. Suppose we want to represent K > 0 to the base e, and suppose that e^r is the largest integer power of e that does not exceed K. Hence K = c (e^r) where 0 <= c < e. Then we use as highest-order digit d = 0, 1, 2 according as 0 <= c < e/3; e/3 <= c < 2e/3; 2e/3 <= c < e, or in brief, d = [3c/e]. (Of course, the calculation of digits continues by applying the above to the quantity K - d(e^r), now using e^(r-1) in place of e^r, ad infinitum.) I wonder if anything nifty comes from treating the digits in this symmetrical fashion. --Dan
At 07:52 PM 3/6/2003, asimovd@aol.com wrote:
I forget who mentioned that e was found to be the "best" number base. I would like to know in what sense it is best -- or is this just some kind of offhand, non-rigorous comment?
The integral and derivative of e^x is e^x, and e is unique in that respect. The integral of 1/x is log(x) where log(x) is the natural (base e) log. Any other base of log introduces a constant factor, which makes the integral the natural log. The constant e has an especially simple series (implying that it is fundamental) and e^x has an especially simple power series. Also e is the limit as n->infinity of (1+1/n)^n. And e occurs naturally in many, many places Gaussian curve, Catenary, hyperbolic trig functions, ... You might be interested in the book ""e - the story of a number", by Eli Maor.
Quoting asimovd@aol.com:
I forget who mentioned that e was found to be the "best" number base. I would like to know in what sense it is best -- or is this just some kind of offhand, non-rigorous comment?
My memory is very hazy about this, but doesn't the result go back to Shannon and have something to do with entropy? - hvm ------------------------------------------------- Obtén tu correo en www.correo.unam.mx UNAMonos Comunicándonos
Quoting mcintosh@servidor.unam.mx:
Quoting asimovd@aol.com:
I forget who mentioned that e was found to be the "best" number base. I would like to know in what sense it is best -- or is this just some kind of offhand, non-rigorous comment?
My memory is very hazy about this, but doesn't the result go back to Shannon and have something to do with entropy?
I didn't find Petersen's book on coding theory but my memory has improved somewhat. Entropy is p ln(p) summed over alternatives. It vanishes for p = 0 and p = 1, with a maximum somewhere inbetween, say at p = 1/e. That isn't integral, but it lies close to either 1/2 or 1/3; being that it is a fairly round maximum, either will work pretty well and better than some other approximation. The former leads to binary coding, where the best choice is to split things in half even though it can lead to lots of bits in a numerical value. Folklore holds that von Neumann used this optimization to justify using binary hardware for computing directly without passing through decade counters and interconversion from binary flipflops to decimal finger based tradition. Folklore also holds that CMOS can be a good thing with three levels: +, 0, - because it is still near the optimum even if on the other side. - hvm ------------------------------------------------- Obtén tu correo en www.correo.unam.mx UNAMonos Comunicándonos
=mcintosh@servidor.unam.mx Entropy is p ln(p) summed over alternatives. It vanishes for p = 0 and p = 1, with a maximum somewhere inbetween, say at p = 1/e.
An interesting thing about this formula is that the maximum is independent of the base used for the logarithm. However, are you sure about that p factor? I thought the information content of a message was, roughly, how "surprising" it was, hence simply -ln(p) (which is why information gets called "negative entropy"). Getting an impossible message would be a miraculous epiphany, containing infinite information (alas of the form "everything you know is wrong!"<;-) I'll try to dig out my copy of Shannon's original paper and see what he said. By the way, is there an analogous "quantum entropy", which would involve taking the log of the "complex probability"?
Quoting Marc LeBrun <mlb@fxpt.com>:
However, are you sure about that p factor? I thought the information content of a message was, roughly, how "surprising" it was, hence simply -ln(p) (which is why information gets called "negative entropy").
Boltzman had the inspiration that "entropy is the logarithm of probability" while working with the thermodynamics of ideal gasses because he wanted something that was additive like energy, whereas the things he wanted to use were multiplicative. Shannon was concerned with relative probabilities and the representation of numbers by arabic numerals - positional notation. Logarithm of a number tells how many digits needed to express it. He asked, "Do two books hold twice as much information as one book?" They have the square of the number of letter sequences, so use logarithms to get double. To average quantities relative to the probability of their occurrence, multiply the quantity by its probability - to average xi, sum pi * xi where in more usual terms pi is the number of instances of xi divided by N, the total number of data. So, to average entropies, multiply the entropy (a la Boltzman) by the chance of finding it, and so get p ln(p). The - is due to Szilard, who countered Maxwell's Demon by showing that you could compensate a decrease in entropy by accounting for the demon's knowledge of the environment he (she?) was organizing. Hence, "information is negative entropy." At this point it is necesary to see what is being averaged. One of the harder parts of learning (and teaching) the use of probability is knowing when it is necessary to include the chance that something >didn't< happen and so include a term depending on (1-p) as well as the one depending on p. That is why maximizing p ln[whaterver base](p) reccommends the use of e as a number base, and accepts its neighbors 2 and 3 as reasonable integer substitutes. However if each of two alternatives is to be recognized, the "sum over alternatives" becomes p ln(p) + (1-p) ln (1-p) which is symmetrical and has its maximum at 1/2, not 1/e, expressing maximum entropy where the two alternatives are equally probable. That is where unbiased estimates and Bernoulli trials come from.
By the way, is there an analogous "quantum entropy", which would involve taking the log of the "complex probability"?
Yes. I think von Neumann introduced it in the aftermath of Szilard's paper. If averages are gotten by (Integral) psi* Q psi dx, think of psi as a vector, Q as a matrix, and that the trace of a product admits a cyclic shift. The average is then Trace{ [psi - psi*] Q) and psi-psi* can be called a density matrix, P (for probability, not momentum). Entropy is then H = Trace{p ln(P)). There is no need to worry about "complex probabilities." That is not to say that people haven't done so, Richard Feynman being a proninent example. Meanwhile entropy is defined via the density matrix, and is so used even in the most recent work on quantum computing and quantum cryptography. - hvm ------------------------------------------------- Obtén tu correo en www.correo.unam.mx UNAMonos Comunicándonos
Suppose we want to represent K > 0 to the base e, and suppose that e^r is the largest integer power of e that does not exceed K. Hence K = c (e^r) where 0 <= c < e.
Then we use as highest-order digit d = 0, 1, 2 according as 0 <= c < e/3; e/3 <= c < 2e/3; 2e/3 <= c < e, or in brief, d = [3c/e].
(Of course, the calculation of digits continues by applying the above to the quantity K - d(e^r), now using e^(r-1) in place of e^r, ad infinitum.)
I wonder if anything nifty comes from treating the digits in this symmetrical fashion.
Yes: underflow. When you try to represent 5, the first digit is floor(3*5/e^2) = 2, and the remainder is 5-2e = -0.436... You're stuck. The set of unrepresentable numbers looks pretty dense, except at the start: 5 10 12 14 20 21 22 27 30 34 37 38 39 40 42 47 54 60 61 67 68 69 72 73 74 80 82 83 84 87 89 91 92 93 94 99 100 101 102 103 104 105 106 107 108 109 116 122 123 128 129 132 133 134 136 142 143 144 145 146 147 148 151 153 161 162 163 167 168 171 181 182 183 185 186 187 188 194 195 198 199 200... Of the first 10000 positive integers, 6099 are unrepresentable. -- Don Reble djr@nk.ca
participants (5)
-
asimovd@aol.com -
Don Reble -
Jud McCranie -
Marc LeBrun -
mcintosh@servidor.unam.mx