[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-ocrad] A few ocrad problems
From: |
Don Moir |
Subject: |
[Bug-ocrad] A few ocrad problems |
Date: |
Sat, 1 Jun 2013 16:22:51 -0400 |
Hello,
I am a developer and just started using ocrad. I am using ocrad-0.22-rc2.
Results so far look good and better than other other sources I have tried.
Here's some problems I have found:
1) An orphan capitial letter I fails to be detected. The current code checks
for a before or after AlphaNum but space is not taken into account.
So for example if you have: a<space>|<space>b The I is not detected as a
capital letter I and left as a vertical bar. So when setting lcode and rcode in
Textline::recognize2 when checing vertical bar, you need to skip before and
after spaces to see what lcode and rcode need to be set to.
2) I have an example with the word UP in it. This is detected as uP (lower case
u)
3) Failure to detect a space character in latin_space.pbm. The words como jamás
are detected as comojamás, otherwise the recognition is perfect here.
4) Failure to detect merged ti, vi, im, ll, in merged_ti_vi_im_ll.pbm
The attached zip contains 6 files:
cap_I_and_UP.pbm (for items 1 and 2)
cap_I_and_UP.txt
latin_space.pbm (for item 3)
latin_space.txt
merged_ti_vi_im_ll.pbm (for item 4)
merged_ti_vi_im_ll.txt
ocrad is working better for me than anything else so far so looks very
promising.
I am wondering if possible merged characters should be added as special
characters. like TT, ti, etc so then in future it's easy to add such
combinations.
- [Bug-ocrad] A few ocrad problems,
Don Moir <=