From: Henry Baker <hbaker1@pipeline.com>
I just saw a video lecture on data compression and it claimed that Bentley, et al's compression scheme in CACM 29,4 (April 1986),320-330, is used in current bzip (bzip2 ?).
Yes. It is an LRU scheme. It's used as *part of* the bzip2 scheme. The main part is called the Burrows-Wheeler Transform (BWT).
Basically, Bentley's scheme assumes that both the sender & receiver keep an ordered list (hopefully the same list!) and when a symbol is found on the list, the _index_ into the list is sent, and both parties move that symbol to the front of the list. If the symbol is not on the list, then "n+1" is sent, indicating that a new element is coming just after the index, and both sides then increase the size of their lists to n+1.
Or you start the list full of all the possibilities. In bzip2's case, 257.
If the file to be compressed has "locality of reference", then Bentley's scheme can do better than any static probability scheme -- e.g., Huffman coding.
...when combined with a combination of run-length and Huffman coding of the list indexes. (Otherwise you're just encoding bytes as bytes.) So, the BWT does this (not quite true): Say you have a message of N bytes. Append a unique EOF character. Treat the message as a loop of N+1 bytes. Take the length-N+1 string starting at each position of the loop: N+1 strings of N+1 bytes each. Sort the list lexically. Now, take the string formed by reading down the *last* column of the sorted list. That is the BWT. The process is reversible. Encode that with the LRU trick. Encode that with run-length and Huffman codes. Or as Burrows and Wheeler say, transform the input into a form that has locality, then transform that into a skewed distribution, then use a compression method that takes advantage of the skewed distribution.
Belady's OPT optimal page replacement algorithm... basically does LRU on the _reversed_ reference string, so as to kick out the page to be used the furthest into the future.
Not familiar with.
But file compression would seem to be oblivious to the direction of processing, so whether the file is processed forwards or backwards shouldn't matter very much; if it does, do both & output the better result.
(Or do both on the BWT of the file, which isn't really time-like.)
If Bentley & Belady are right, then any bzip-compressed reference traces are _already_ compressed if they've been through an LRU (i.e., "stack") page-replacement algorithm!
This may be foiled by having the BWT between the two LRUs. --Steve