h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] units module


From: Peter Colberg
Subject: Re: [h5md-user] units module
Date: Tue, 5 Nov 2013 17:55:21 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Felix,

On Mon, Nov 04, 2013 at 10:32:36AM +0100, Felix Höfling wrote:
> Writing UTF8 is easy as you pointed out, what about reading? I've never
> used it in practice. Can a reader store the raw string in char* and pass
> it to the udunits2 library? If this work with either encoding we may drop
> the "encoding" field of course.

UTF-8 is an encoding of the Unicde character set that uses one or
multiple bytes to represent a character. In C a UTF-8 encoded string
can be stored in a char array.

The HDF5 library does not handle encodings at all; the encoding
property for string datatypes is only an indication for the user.
One can store Unicode strings containing multiple-byte characters
using H5T_CSET_ASCII, and the HDF5 library does not complain.

The downside of this lack of encoding support is that the encoding of
the memory datatype specified when reading/writing a dataset/attribute
must match the encoding of the file datatype. Which is an unfortunate
design choice; e.g., reading an attribute with file datatype encoding
H5T_CSET_ASCII using memory datatype encoding H5T_CSET_UTF8 should
work, but it doesn't.  One can register datatype conversion functions
as a band-aid, but that must be repeated for every application.

> Before we add something to the specificiation we should test it somehow.
> What about providing a code snippet in the implementation part of how to
> read UTF8 unit strings and how to interact with, e.g., udunits2?

Absolutely.

I would suggest adding an examples directory to the repository, and a
subdirectory for each library interface, e.g., HDF5 C, HDF5 Fortran,
and h5py. Sadly the HDF5 for LuaJIT module is not ready yet, otherwise
I would have written a few examples as well by now.

Regards,
Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]