libextractor
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [libextractor] Solaris, iconv and libextractor


From: Christian Grothoff
Subject: Re: [libextractor] Solaris, iconv and libextractor
Date: Tue, 11 Apr 2006 18:45:37 -0700
User-agent: KMail/1.9.1

I don't know if UNICODE vs. UTF-16 makes a difference (couldn't find anything 
on-line); so I've put in an #ifdef SOLARIS for now.  In general, the entire 
PDF plugin is scheduled for slaughter (to be replaced with code not based on 
xpdf), so while I agree that the rest of the code is suboptimal, I don't see 
a reason to fix it at this point.

If you have any insights as to what is the exact encoding used by PDF here (in 
particular wrt to the iconv conversion call),  please let me know.

Christian

On Tuesday 11 April 2006 07:31, Michał Kowalczuk wrote:
> I had another problem under Solaris. Sun libiconv doesn't support
> conversion from UNICODE to UTF-8. So convertToUtf8(u, 2, "UNICODE") invoked
> from printInfoString() in src/plugins/pdf/pdfextractor.cc fails. It has
> 2-byte, non-zero terminated string on input, so strdup() (called on iconv
> failure) from convertToUtf8() returns junk. As I checked, conversion from
> UTF-16 gives the same result under Linux (GNU libiconv) as conversion from
> UNICODE (I'm not sure if it is equivalent). Moreover, what is important to
> me, Sun libiconv supports convertion from UTF-16 to UTF-8.
>
> I don't really understand, why in printInfoString() 2-byte buffer is used
> for conversion. Isn't it easier to pass the whole (s+2), which is (len-2)
> bytes long, to convertToUtf8()? Less mallocs, less function calls.
>
> The same two things applies also to printInfoDate() in the same file.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]