From: Henry Baker <hbaker1@pipeline.com> However, since both the ascii text & PDF images are available, someone could easily automate the task of creating such PDF/text files -- even someone outside of Google.
You may be able to OCR the pdf files using the online DejaVu converter, assuming there aren't protection bits that some part of the converter gits skeered at: http://any2djvu.djvuzone.org/ Maybe you could then automate the process of correcting the embedded text in the djvu from the ascii from Google. A possible added benefit is that djvu format is much more compact than pdf. On the other hand, the reader for it (at least for the Mac) isn't so nice. Also, the default compression settings don't always seem to do the right thing about distinguishing foreground from background in illustrations. A simpler trick: maybe the page numbers are in the ascii? --Steve