[Swftools-common] pdf2swf textSnapshots in OCR'ed PDF files

swftools-common

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Swftools-common] pdf2swf textSnapshots in OCR'ed PDF files

From:	tachy0n tachy0n
Subject:	[Swftools-common] pdf2swf textSnapshots in OCR'ed PDF files
Date:	Mon, 3 Aug 2009 16:47:32 -0700

I'm writing a viewer for pdf2swf produced swf file that allows
searching for texts with the swf. I used the example provided here -
http://wiki.swftools.org/index.php/How_do_I_highlight_text_in_the_SWF%3F
- and was able to successfully search for and highlight matching texts
on swf created from digital PDF documents (i.e. PDFs which are created
from other formats like MS word etc.)

I wanted to do the same for PDF documents created with a scanner. So I
took the scanned document, dumped the jpeg pages using xpdf and ran
cuneiform OCR with hocr2pdf script to create a searchable PDF. The
searchable PDF works as expected when opened in Adobe PDF reader with
search highlighting and text select/copy.

However when I load it in my viewer, the text highlights are not shown
even though the textsnapshots for each frame in the MovieClip has the
OCR'ed text including the correct font metrics/bounding boxes. When I
set the alpha of the movie clip to <1 the highlight shows up correctly
(albeit with spacing between characters probably due to inaccurate
font metric).

Is there anything different in the way pdf2swf creates a swf when it
is run on a searchable scanned PDF (jpegs + embedded text)? How can I
make the search highlighting work without having to reduce the alpha
value of the displayed document?

Thanks,
tachy0n

[Prev in Thread]

Current Thread

[Next in Thread]

[Swftools-common] pdf2swf textSnapshots in OCR'ed PDF files, tachy0n tachy0n <=
- [Swftools-common] Re: pdf2swf textSnapshots in OCR'ed PDF files, tachy0n tachy0n, 2009/08/05
  - Re: [Swftools-common] Re: pdf2swf textSnapshots in OCR'ed PDF files, Matthias Kramm, 2009/08/07
    - Re: [Swftools-common] Re: pdf2swf textSnapshots in OCR'ed PDF files, tachy0n tachy0n, 2009/08/07

Prev by Date: Re: [Swftools-common] Use jpg in swfcombine
Next by Date: RE: [Swftools-common] List of Common Error on windows?
Previous by thread: Re: [Swftools-common] Use jpg in swfcombine
Next by thread: [Swftools-common] Re: pdf2swf textSnapshots in OCR'ed PDF files
Index(es):
- Date
- Thread