[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gnu-libiconv] Byte-order learned from BOM is forgotten on reset
From: |
Tomas Kalibera |
Subject: |
[bug-gnu-libiconv] Byte-order learned from BOM is forgotten on reset |
Date: |
Thu, 12 Dec 2024 14:12:37 +0100 |
User-agent: |
Mozilla Thunderbird |
Dear developers,
libiconv forgets the byte-order it has learned from BOM in case a reset
is issued. The attached program demonstrates the problem, seen still in
version 1.17. Perhaps this is a bug?
The iconv implementation in glibc doesn't do this (tested on Ubuntu 24.04).
Ulrich Drepper writes [https://bugzilla.redhat.com/show_bug.cgi?id=165368]:
"Flushing using iconv() only resets the shift state. This is needed for
stateful encodings with states where the caller wants a converted string to
end in the initial state. The BOM recognition has nothing to do with shift
states. Once the byte order is determined this is a property which stays
with the iconv_t descriptor for its lifetime."
POSIX says: "For state-dependent encodings, the conversion descriptor
cd is placed into its initial shift state by a call for which inbuf is a
null pointer," but UTF-16 (nor UTF-32) isn't a state-dependent encoding
(6.2 clarifies what is meant by state-dependent encodings).
On Ubuntu 24.04 with glibc iconv, the output of the test program is:
Conversion works iteratively (little-endian).
Conversion works iteratively (big-endian).
With GNU libiconv 1.17 it is:
Iterative conversion fails with reset (little-endian).
Conversion works iteratively (big-endian).
Perhaps the use case feels a bit arbitrary, but a program iteratively
converting an input string might want to flag invalid bytes from the
input in the output using some ASCII text, e.g. '<XX>' for a byte XX.
Such a program might want to support also state-dependent encodings, so
it would issue a reset when encountering an invalid byte, so that the
ASCII text in the output would be interpreted correctly. But when this
reset causes the byte-order to be forgotten, the rest of the input won't
be converted properly.
Thanks
Tomas
reset.c
Description: Text Data
- [bug-gnu-libiconv] Byte-order learned from BOM is forgotten on reset,
Tomas Kalibera <=