[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Swftools-common] pdf2swf textSnapshots in OCR'ed PDF files
From: |
tachy0n tachy0n |
Subject: |
[Swftools-common] pdf2swf textSnapshots in OCR'ed PDF files |
Date: |
Mon, 3 Aug 2009 16:47:32 -0700 |
I'm writing a viewer for pdf2swf produced swf file that allows
searching for texts with the swf. I used the example provided here -
http://wiki.swftools.org/index.php/How_do_I_highlight_text_in_the_SWF%3F
- and was able to successfully search for and highlight matching texts
on swf created from digital PDF documents (i.e. PDFs which are created
from other formats like MS word etc.)
I wanted to do the same for PDF documents created with a scanner. So I
took the scanned document, dumped the jpeg pages using xpdf and ran
cuneiform OCR with hocr2pdf script to create a searchable PDF. The
searchable PDF works as expected when opened in Adobe PDF reader with
search highlighting and text select/copy.
However when I load it in my viewer, the text highlights are not shown
even though the textsnapshots for each frame in the MovieClip has the
OCR'ed text including the correct font metrics/bounding boxes. When I
set the alpha of the movie clip to <1 the highlight shows up correctly
(albeit with spacing between characters probably due to inaccurate
font metric).
Is there anything different in the way pdf2swf creates a swf when it
is run on a searchable scanned PDF (jpegs + embedded text)? How can I
make the search highlighting work without having to reduce the alpha
value of the displayed document?
Thanks,
tachy0n
- [Swftools-common] pdf2swf textSnapshots in OCR'ed PDF files,
tachy0n tachy0n <=