[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] Support question: libiconv on system with glibc?

From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] Support question: libiconv on system with glibc?
Date: Thu, 5 Feb 2009 01:09:55 +0100
User-agent: KMail/1.9.9


Russell McOrmond wrote:
>    I have an environment where I would like to separate off as much of our 
> application into a chroot() environment as possible.  We figured that 
> using the sepatate libiconv would help, so that we didn't need to bring 
> into the chroot() environment all of glibc (IE: /usr/lib/gconv , etc).
>    I have been having a problem getting libiconv to work in this 
> environment.  This is a RedHat Enterprise 4 machine (glibc 2.3.4)

Note that glibc uses the /usr/lib/gconv/ directory not only for iconv()
but also for locales that have the specified encoding. If your program,
at some point (for example in order to sort strings for a Japanese user)
sets the locale to ja_JP.EUC-JP, glibc will need access to

But if the only locales that your program uses are the "C" locale and some
UTF-8 locales, then your approach to use libiconv instead of glibc's
iconv is workable.

> trying to compile libiconv 1.12.

Remember that building and installing libiconv on a glibc system is an
unusual situation. (It works, and is supported, but is not the normal

>    We have data that is encoded in UTF-16 which we are outputing in UTF-8 
> (very simple transcode), inserted into an HTML template.
> The relevant part should be output in UTF-8 as:
> <td>Chernozémique</td>
> (Note the accented e)
> Here is the test using 'od' to show the UTF-8 encoding when using the 
> glibc version of the iconv functions.
> -bash-3.00$ sh ~/test-mapserv.sh | od -c -j247 -N23
> 0000367   <   t   d   >   C   h   e   r   n   o   z 303 251   m   i   q
> 0000407   u   e   <   /   t   d   >
> 0000416
> And here is what happens when I use the libiconv version.
> -bash-3.00$  export 
> LD_PRELOAD=/server/downloads/src/libiconv-1.12/lib/preloadable_libiconv.so
> -bash-3.00$ sh ~/test-mapserv.sh | od -c -j247 -N48
> 0000367   <   t   d   > 344 214 200 346 240 200 346 224 200 347 210 200

$ printf '\344\214\200\346\240\200\346\224\200\347\210\200' | iconv -f UTF-8 -t 
UCS-4LE | hexdump -e '"%06.6_ax  " 4/4 "%08X "' -e '"\n"'
000000  00004300 00006800 00006500 00007200

So the characters that are being output are U+4300, U+6800, etc. instead of
U+0043, U+0068 etc.

>    In case anyone is curious how iconv is being called, the relevant code 
> is here: 
> http://trac.osgeo.org/mapserver/browser/trunk/mapserver/mapstring.c#L1504
>    The variable 'encoding' on input is set to "UTF-16" , so this is a 
> simple conversion from UTF-16 to UTF-8.

"UTF-16" is ambiguous. You better use UTF-16LE or UTF-16BE, depending on the
endianness of your machine.

But actually in your code the input is not encoded in UTF-16, it is a sequence
of wchar_t's. wchar_t are not necessarily Unicode at all, for example in Solaris
or FreeBSD they aren't. To convert from/to wchar_t using libiconv or glibc, use
an encoding name "wchar_t".

Btw (off-topic), in
you have a very bad hash function: Strings which differ in 2 characters will
often lead to the same hash code. For example, the strings
will all yield the same hash code. This can drown the performance of an
application, see <http://www.haible.de/bruno/hashfunc.html>. Remember that
a hash table is no longer O(1) for each access if the elements are not
approximately equidistributed across the hash buckets.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]