Quoting Marc LeBrun <mlb@fxpt.com>:
However, are you sure about that p factor? I thought the information content of a message was, roughly, how "surprising" it was, hence simply -ln(p) (which is why information gets called "negative entropy").
Boltzman had the inspiration that "entropy is the logarithm of probability" while working with the thermodynamics of ideal gasses because he wanted something that was additive like energy, whereas the things he wanted to use were multiplicative. Shannon was concerned with relative probabilities and the representation of numbers by arabic numerals - positional notation. Logarithm of a number tells how many digits needed to express it. He asked, "Do two books hold twice as much information as one book?" They have the square of the number of letter sequences, so use logarithms to get double. To average quantities relative to the probability of their occurrence, multiply the quantity by its probability - to average xi, sum pi * xi where in more usual terms pi is the number of instances of xi divided by N, the total number of data. So, to average entropies, multiply the entropy (a la Boltzman) by the chance of finding it, and so get p ln(p). The - is due to Szilard, who countered Maxwell's Demon by showing that you could compensate a decrease in entropy by accounting for the demon's knowledge of the environment he (she?) was organizing. Hence, "information is negative entropy." At this point it is necesary to see what is being averaged. One of the harder parts of learning (and teaching) the use of probability is knowing when it is necessary to include the chance that something >didn't< happen and so include a term depending on (1-p) as well as the one depending on p. That is why maximizing p ln[whaterver base](p) reccommends the use of e as a number base, and accepts its neighbors 2 and 3 as reasonable integer substitutes. However if each of two alternatives is to be recognized, the "sum over alternatives" becomes p ln(p) + (1-p) ln (1-p) which is symmetrical and has its maximum at 1/2, not 1/e, expressing maximum entropy where the two alternatives are equally probable. That is where unbiased estimates and Bernoulli trials come from.
By the way, is there an analogous "quantum entropy", which would involve taking the log of the "complex probability"?
Yes. I think von Neumann introduced it in the aftermath of Szilard's paper. If averages are gotten by (Integral) psi* Q psi dx, think of psi as a vector, Q as a matrix, and that the trace of a product admits a cyclic shift. The average is then Trace{ [psi - psi*] Q) and psi-psi* can be called a density matrix, P (for probability, not momentum). Entropy is then H = Trace{p ln(P)). There is no need to worry about "complex probabilities." That is not to say that people haven't done so, Richard Feynman being a proninent example. Meanwhile entropy is defined via the density matrix, and is so used even in the most recent work on quantum computing and quantum cryptography. - hvm ------------------------------------------------- Obtén tu correo en www.correo.unam.mx UNAMonos Comunicándonos