PDF scanning the booklet

Notes from my scanning workflow from yesterday.

I’ve had my Epson Perfection 3490 Photo scanner for years — a gift from Hubby, Christmas 2003 or 2004, I think — but it never played happily with whatever Linux set-up I had at the time. But there’s a maxim that Linux distributions (editions) work better with older equipment, so I decided to give it another try. (The maxim, however, is breaking down as manufacturers begin supporting Linux.) Even now, I needed a proprietary driver from Epson.Â Entries 46 and 59 of this thread at Ubuntuforums.com was all I needed to get my scanner to work.

Now for software. I’ve never liked the xsane frontend — it’s too dang hard to use; through thanks to the developers for leading the way — so I immediately sought an alternative.Â Installed and tried and tried to make GNOME Scan (flegita) work. PDG images came out fine, but when attempting to scan for PDFs, I could only get my scanner to make a little fragment of the selected area. Fail. With Gscan2PDF, which you can add the usual way, I had a winner — a surprise to boot: integration with tesseract-ocr, an optical character recognition system. And not just any one: the system that Google Books uses, and which now Google supports.Â If you’re running the Intrepid Ibex (8.10) version of Ubuntu Linux, get tesseract-ocr in universe; grab language support, too: tesseract-ocr-eng for English.

Started gscan2pdf — it’s in the main Applications menu under “Graphics” — and selected “Scan” using either the menu or icon. This pulls up a dialog box. Under the “Scan Options” tab, I found “Paper Size”, which I edited to create a new size for the opened Liberala Himnaro, with the sizes in millimeters. This saved scanning and cropping time.Â Back to the “Page Options” tab, I selected for “Post processing” a rotation of 90 degrees (because that’s how it fit on the scanner) and “clean up”.Â Â Then I scanned all the pages.

The Save dialog allowed different color (color, gray, line art) and resolution options and I tried a few until I got both aÂ file of a manageable size for sharing and a robust one for archiving. Note: this is a pretty slow process.

There was an option for OCR under “Page Options” whichÂ didn’t use since Esperanto isn’t a supported language. But I’ll rummage through my archives to find a scan — and perhaps an OCR — worthy of my readership.

By Scott Wells

Leave a comment