From: Henry Baker <hbaker1@pipeline.com> Date: 3/1/20, 12:57 PM
This problem has been bothering me ever since I learned about Fourier transforms and the frequency domain as an undergraduate who loves music.
It sort-of bothers me but I sort-of understand it too. And, Fourier transforms bother me because they sample all the frequencies within the same time- window.
Unlike in electrical engineering and physics, where frequency is a *linear* scale, and an entire bounded spectrum can be translated up/down in frequency using mixing converters, 12-tone music uses a *log* frequency scale, and I'm not aware of any simple Fourier formulae that work on a log frequency scale.
I've thought of stacks of semitone, or, say, 1/3 semitone decimation filters(*). I suppose that's equivalent to some wavelet basis. Of course all the information (up to the Nyquist frequency anyway) can be kept by wavelets or decimation filters, but...
The ultraviolet catastrophe shows that the amount of information encoded in higher musical octaves grows enormously, so any invertible Fourier translation of standard 12-tone scales is going to have to throw away this extra information.
...but our ears are throwing away information of course. I'm not sure how much we keep, we can hear some pretty subtle things, but the main point is that we need to recognize frequency *ratios*, both of simultaneous harmonics and of sounds separated in time. (Or maybe, that it is good enough to know that two frequencies are *likely* to be in a simple ratio.) I guess I should interrupt myself and say that perceptual systems seem to be specialized to what's in the species' typical environment and what tends to be most important. It's strange that the 12-tone scale seems so reasonable. (Reminds me of the cult (**) of 1/f or pink noise also-- why would a sound have a spectrum with equal power in every octave? But some do.) A log scale gives you a handy way to detect frequency ratios to some percentage accuracy. (When people say our ears use a log scale, they are being a bit metaphorical about how our frequency accuracy is roughly a percentage of the frequency rather than an absolute number of Hertz, and I do mean only roughly. I remember reading that across a number of octaves in the middle it's a flat... 0.4 Hz?) (*) a decimation filter doesn't throw away samples, it takes a signal with, say, sample rate A and produces two outputs, a high-pass with sample rate B and a low-pass with rate C such that A = B + C. (Of course you can then throw away one or both!) Nyquist sort-of says you can do this nicely, and I believe there is usually an exact reconstruction formula. I am not a digital signal processor, but I play one in a band. (**) "Pink noise is one of the most common signals in biological systems." Quoted in https://en.wikipedia.org/wiki/Pink_noise --Steve