bug-ocrad
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-ocrad] Re: OCRAD project: library is needed


From: Dmitry Katsubo
Subject: [Bug-ocrad] Re: OCRAD project: library is needed
Date: Tue, 6 Jul 2010 14:18:06 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100502 Shredder/3.0.5pre

Hi Antonio!

I have discussed this point shortly with Igor: you should not bother
about how to compile OSRA. I need to develop a test-case to demonstrate
the problem.

So, here goes the test case.

The results I get when using OCRAD_bitmap in line 128:

Test 1: width x height = 10x11
+ recognised via Character::recognize1(): N
+ recognised via OCRAD_result_first_character():
Test 2: width x height = 10x11
+ recognised via Character::recognize1(): N
+ recognised via OCRAD_result_first_character(): w
Test 3: width x height = 11x11
+ recognised via Character::recognize1(): _
+ recognised via OCRAD_result_first_character(): r
Test 4: width x height = 18x7
+ recognised via Character::recognize1(): _
+ recognised via OCRAD_result_first_character(): r
Test 5: width x height = 9x10
+ recognised via Character::recognize1(): _
+ recognised via OCRAD_result_first_character(): r
Test 6: width x height = 9x10
+ recognised via Character::recognize1(): /
+ recognised via OCRAD_result_first_character(): t

I expect that "N" in test cases 1 and 2 is recognized not worse then via
API. Also API recognizes "r" and "t", which may trigger false positives.

When I change to OCRAD_greymap, I get the following result:

Test 1: width x height = 10x11
+ recognised via Character::recognize1(): N
+ recognised via OCRAD_result_first_character(): o
Test 2: width x height = 10x11
+ recognised via Character::recognize1(): N
+ recognised via OCRAD_result_first_character(): o
Test 3: width x height = 11x11
+ recognised via Character::recognize1(): _
+ recognised via OCRAD_result_first_character(): o
Test 4: width x height = 18x7
+ recognised via Character::recognize1(): _
+ recognised via OCRAD_result_first_character(): ~
Test 5: width x height = 9x10
+ recognised via Character::recognize1(): _
+ recognised via OCRAD_result_first_character():
Test 6: width x height = 9x10
+ recognised via Character::recognize1(): /
+ recognised via OCRAD_result_first_character(): n

I expect the results to be the same as above at least for consistency :)

There should be some obvious mistake in the test code...

Thank you in advance!

On 18.06.2010 19:34, Antonio Diaz Diaz wrote:
> Hello Igor,
> 
> Igor Filippov [Contr] wrote:
>>> As explained here[1], Blob is an internal type of ocrad. It is
>>> neither documented not guaranteed to remain stable. Moreover, Blob
>>> has some requirements, like pixel connectivity, that could make osra
>>> inestable if your data does not meet them.
>>> 
>>> [1] http://lists.gnu.org/archive/html/bug-ocrad/2009-12/msg00003.html
>>>
>>> Lines 195-220 of osra_ocr.cpp already use the public API from
>>> ocradlib.h. I think they are comented out because Igor Filippov found
>>> they produce worse results than directly using Blob. I have to
>>> investigate this, and it would be useful if someone could provide me
>>> with an image showing the bug (correctly recognized as Blob, but not
>>> recognized by new API).
>>
>> I have sent you such an image on January 8th of this year with the
>> description of what I'm getting by using Blob vs. what I'm getting (or
>> rather not getting) by using standard API.
>>
>> This was your reply back then:
>> 
>> From:        Antonio Diaz Diaz <address@hidden>
>> To:  Igor Filippov <address@hidden>
>> Subject:     Re: Version 0.19-pre1 of GNU Ocrad released
>> Date:        01/09/2010 02:26:38 PM
>> 
>> Igor Filippov wrote:
>>>> Attached are output from the old version and the new version along with
>>>> the image. Note that the new version did not get "N" nitrogen label -
>>>> it seems to have either "w" or "r" instead.
>>
>> I have noticed two things about this image; the letters are a little 
>> small for ocrad (8x9), and it is a greymap with more than two pixel
>> values.
>> 
>> I'll try to solve the size problem as soon as I can (I am now busy 
>> working on lzlib).
>> 
>> ===================================================================
> 
> What you sent me (apodaca.png) was a complete image of a molecule with 6
> rings, four "N", one "O" and one "HO". What I need is the raw data feed
> to ocrad as Blob or through the OCRAD_set_image function. Maybe some
> disconnected pixels are being ignored by Blob functions but are
> interfering character formation when using the API.
> 
> Unfortunately I was unable to compile osra last time I tried because of
> some dependency not installed. I'll try again ASAP.

-- 
With best regards,
Dmitry

Attachment: osra_ocr.cpp
Description: Text document

Attachment: Makefile
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]