[math-fun] Re: Google books / "text behind image"

2 Nov 2007

      ...
From: Henry Baker <hbaker1@pipeline.com>
However, since both the ascii text & PDF images are available, 
someone could easily automate the task of creating such PDF/text 
files -- even someone outside of Google.
You may be able to OCR the pdf files using the online DejaVu converter,
assuming there aren't protection bits that some part of the converter
gits skeered at:
    http://any2djvu.djvuzone.org/

Maybe you could then automate the process of correcting the embedded
text in the djvu from the ascii from Google.

A possible added benefit is that djvu format is much more compact than
pdf.  On the other hand, the reader for it (at least for the Mac)
isn't so nice.  Also, the default compression settings don't always seem
to do the right thing about distinguishing foreground from background in
illustrations.

A simpler trick: maybe the page numbers are in the ascii?

  --Steve

Steve Witham

Steve Gray

Tom Knight

Joerg Arndt

tags

participants (4)