[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [libextractor] Solaris, iconv and libextractor
From: |
Christian Grothoff |
Subject: |
Re: [libextractor] Solaris, iconv and libextractor |
Date: |
Tue, 11 Apr 2006 18:45:37 -0700 |
User-agent: |
KMail/1.9.1 |
I don't know if UNICODE vs. UTF-16 makes a difference (couldn't find anything
on-line); so I've put in an #ifdef SOLARIS for now. In general, the entire
PDF plugin is scheduled for slaughter (to be replaced with code not based on
xpdf), so while I agree that the rest of the code is suboptimal, I don't see
a reason to fix it at this point.
If you have any insights as to what is the exact encoding used by PDF here (in
particular wrt to the iconv conversion call), please let me know.
Christian
On Tuesday 11 April 2006 07:31, Michał Kowalczuk wrote:
> I had another problem under Solaris. Sun libiconv doesn't support
> conversion from UNICODE to UTF-8. So convertToUtf8(u, 2, "UNICODE") invoked
> from printInfoString() in src/plugins/pdf/pdfextractor.cc fails. It has
> 2-byte, non-zero terminated string on input, so strdup() (called on iconv
> failure) from convertToUtf8() returns junk. As I checked, conversion from
> UTF-16 gives the same result under Linux (GNU libiconv) as conversion from
> UNICODE (I'm not sure if it is equivalent). Moreover, what is important to
> me, Sun libiconv supports convertion from UTF-16 to UTF-8.
>
> I don't really understand, why in printInfoString() 2-byte buffer is used
> for conversion. Isn't it easier to pass the whole (s+2), which is (len-2)
> bytes long, to convertToUtf8()? Less mallocs, less function calls.
>
> The same two things applies also to printInfoDate() in the same file.