[math-fun] entropic lore and folklore

7 Mar 2003

      Quoting Marc LeBrun <mlb@fxpt.com>:
...
However, are you sure about that p factor?  I thought the information 
content of a message was, roughly, how "surprising" it was, hence simply
-ln(p) (which is why information gets called "negative entropy").
Boltzman had the inspiration that "entropy is the logarithm of probability" 
while working with the thermodynamics of ideal gasses because he wanted 
something that was additive like energy, whereas the things he wanted to 
use were multiplicative. 

Shannon was concerned with relative probabilities and the representation 
of numbers by arabic numerals - positional notation. Logarithm of a number 
tells how many digits needed to express it. He asked, "Do two books hold 
twice as much information as one book?" They have the square of the number 
of letter sequences, so use logarithms to get double. 

To average quantities relative to the probability of their occurrence, 
multiply the quantity by its probability - to average xi, sum pi * xi 
where in more usual terms pi is the number of instances of xi divided 
by N, the total number of data. So, to average entropies, multiply the 
entropy (a la Boltzman) by the chance of finding it, and so get p ln(p). 

The - is due to Szilard, who countered Maxwell's Demon by showing that 
you could compensate a decrease in entropy by accounting for the demon's 
knowledge of the environment he (she?) was organizing. Hence, "information 
is negative entropy." 

At this point it is necesary to see what is being averaged. One of the 
harder parts of learning (and teaching) the use of probability is knowing 
when it is necessary to include the chance that something >didn't< happen 
and so include a term depending on (1-p) as well as the one depending on 
p. 

That is why maximizing  p ln[whaterver base](p) reccommends the use of e 
as a number base, and accepts its neighbors 2 and 3 as reasonable integer 
substitutes. 

However if each of two alternatives is to be recognized, the "sum over 
alternatives" becomes   p ln(p) + (1-p) ln (1-p)  which is symmetrical 
and has its maximum at 1/2, not 1/e, expressing maximum entropy where the 
two alternatives are equally probable. That is where unbiased estimates 
and Bernoulli trials come from.
...
By the way, is there an analogous "quantum entropy", which would involve
taking the log of the "complex probability"?
Yes. I think von Neumann introduced it in the aftermath of Szilard's 
paper. If averages are gotten by (Integral) psi* Q psi dx, think of 
psi as a vector, Q as a matrix, and that the trace of a product admits 
a cyclic shift. The average is then Trace{ [psi - psi*] Q) and psi-psi* 
can be called a density matrix, P (for probability, not momentum). Entropy 
is then  H = Trace{p ln(P)). There is no need to worry about "complex 
probabilities." 

That is not to say that people haven't done so, Richard Feynman being a 
proninent example. Meanwhile entropy is defined via the density matrix, 
and is so used even in the most recent work on quantum computing and 
quantum cryptography. 

- hvm 

-------------------------------------------------
Obtén tu correo en www.correo.unam.mx
UNAMonos Comunicándonos

[math-fun] entropic lore and folklore

mcintosh＠servidor.unam.mx