Hi Felix,
On Mon, Nov 04, 2013 at 10:32:36AM +0100, Felix Höfling wrote:
Writing UTF8 is easy as you pointed out, what about reading? I've never
used it in practice. Can a reader store the raw string in char* and pass
it to the udunits2 library? If this work with either encoding we may
drop
the "encoding" field of course.
UTF-8 is an encoding of the Unicde character set that uses one or
multiple bytes to represent a character. In C a UTF-8 encoded string
can be stored in a char array.
The HDF5 library does not handle encodings at all; the encoding
property for string datatypes is only an indication for the user.
One can store Unicode strings containing multiple-byte characters
using H5T_CSET_ASCII, and the HDF5 library does not complain.
The downside of this lack of encoding support is that the encoding of
the memory datatype specified when reading/writing a dataset/attribute
must match the encoding of the file datatype. Which is an unfortunate
design choice; e.g., reading an attribute with file datatype encoding
H5T_CSET_ASCII using memory datatype encoding H5T_CSET_UTF8 should
work, but it doesn't. One can register datatype conversion functions
as a band-aid, but that must be repeated for every application.