h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] units module


From: Felix Höfling
Subject: Re: [h5md-user] units module
Date: Wed, 06 Nov 2013 16:47:27 +0100
User-agent: Opera Mail/12.15 (Linux)

Am 05.11.2013, 23:55 Uhr, schrieb Peter Colberg
<address@hidden>:

Hi Felix,

On Mon, Nov 04, 2013 at 10:32:36AM +0100, Felix Höfling wrote:
Writing UTF8 is easy as you pointed out, what about reading? I've never
used it in practice. Can a reader store the raw string in char* and pass
it to the udunits2 library? If this work with either encoding we may drop
the "encoding" field of course.

UTF-8 is an encoding of the Unicde character set that uses one or
multiple bytes to represent a character. In C a UTF-8 encoded string
can be stored in a char array.

The HDF5 library does not handle encodings at all; the encoding
property for string datatypes is only an indication for the user.
One can store Unicode strings containing multiple-byte characters
using H5T_CSET_ASCII, and the HDF5 library does not complain.

The downside of this lack of encoding support is that the encoding of
the memory datatype specified when reading/writing a dataset/attribute
must match the encoding of the file datatype. Which is an unfortunate
design choice; e.g., reading an attribute with file datatype encoding
H5T_CSET_ASCII using memory datatype encoding H5T_CSET_UTF8 should
work, but it doesn't.  One can register datatype conversion functions
as a band-aid, but that must be repeated for every application.


So what is the conclusion? That a reader has to be prepared for the
different encodings and define the memory datatype accordingly when
reading strings? What about reading a file datatype UTF8 to a memory
datatype ASCII (i.e., not caring about the encoding)?

If this is the case then I suggest to keep the field "encoding", which
allows a reader to easily check whether it supports the specific flavour
of the "units" module found in the file or not.

Felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]