[math-fun] Re: Google books / "text behind image"
From: Henry Baker <hbaker1@pipeline.com>
I tried searching the downloaded PDF document for Tait's Quaternions, but nothing was there. So it appears that Google isn't using "text behind image", at least not for this book. However, since both the ascii text & PDF images are available, someone could easily automate the task of creating such PDF/text files -- even someone outside of Google.
I downloaded the PDF and submitted it to any2djvu.djvuzone.org, and got a converted .djvu file, but without text behind, even though I specified OCR in the entry form. So, that path isn't working for now. I don't know whether it's a technical issue, a protection-bits issue, or a web server problem at djvuzone. The usual progress log didn't come out, so I don't have any error messages (I emailed them about it). The image compression is pretty nice. 2.9 vs. 9.7MB. The type, equations and line drawings are excellent. The faint scribbles left by readers, which are, say, "thresholdy" in the pdf, are slightly more thresholdy in the djvu. There first page of the publisher's catalog in the back has a couple stains. Text quality over the stains is as good in the djvu as pdf. The cover, opening pages and closing pages have a couple places with dark sections with writing, which are definitely blurrier in the djvu. Djvu has a concept of "background" which is converted with lower resolution; it sometimes makes the wrong decision if you don't hand-tweak it. The viewer I have (MacDjView) scrolls a lot faster through the document than Adobe Reader and Preview do through the pdf. DjVu ought to be included into the pdf standard as a codec. --Steve
participants (1)
-
Steve Witham