bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gnu-libiconv] The utf-8-mac encoder on macOS gives incorrect output


From: Marcin Sulikowski
Subject: [bug-gnu-libiconv] The utf-8-mac encoder on macOS gives incorrect output
Date: Thu, 26 Oct 2017 13:50:24 +0200

Dear libiconv Team,

I've been trying to use libiconv on macOS to convert UTF-8 strings to their NFD form using libiconv's "utf-8-mac" encoding which is available on macOS. This does not always work for me -- in some cases, the iconv function returns incorrect output; apparently when a character to be decomposed cannot be fully written to the given output buffer. The bug can be easily reproduced using the iconv command-line tool on macOS:

  ( printf "a%.0s" `seq 4094` ; echo -n ó ) | iconv -f utf-8 -t utf-8-mac | iconv -f utf-8-mac -t utf-8

The echo's argument is "LATIN SMALL LETTER O WITH ACUTE" (U+00F3) which decomposes to the latin 'o' plus the combining acute. I'd expect this to print aaaaaaa...aaaaaó (i.e., exactly the same as `printf "a%.0s" `seq 4094` ; echo -n ó` prints) but I get aaaaa...aaaaaaao instead -- the accent at the end is lost in the conversion because `iconv -f utf-8 -t utf-8-mac` does not output the combining character at the end:

  $ ( printf "a%.0s" `seq 4094` ; echo -n ó ) | iconv -f utf-8 -t utf-8-mac | hexdump -C
  00000000  61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61  |aaaaaaaaaaaaaaaa|
  *
  00000ff0  61 61 61 61 61 61 61 61  61 61 61 61 61 61 6f     |aaaaaaaaaaaaaao|
  00000fff

This is the version of iconv that I'm using:

$ iconv --version
iconv (GNU libiconv 1.11)
Copyright (C) 2000-2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Bruno Haible.

regards,
Marcin Sulikowski

reply via email to

[Prev in Thread] Current Thread [Next in Thread]