h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Unit attribute versus non-dimensionless quantities


From: Peter Colberg
Subject: Re: [h5md-user] Unit attribute versus non-dimensionless quantities
Date: Wed, 31 Jul 2013 12:15:22 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Jul 31, 2013 at 10:24:54AM +0200, Felix Höfling wrote:
> I have a bit mixed feelings about ignoring completely the
> possibility to specify the physical unit of the data. This is a
> particular strength of HDF5 and, albeit almost trivial, a clear
> improvement over commonly used file formats. (E.g., VMD expects the
> input to be in Ångstrøm if I'm not mistaken.)
> 
> The combination "dataset plus unit attribute" seems to be precisely
> in the spirit of HDF5 to annotate data with attributes. From this
> point of view, I would suggest that dimensionful data are stored as
> HDF5 datasets. On the other hand, favouring attributes over datasets
> for small data makes sense as discussed extensively.

I think this would not work out well.

We can make an exception for the box group, to attach a unit
attribute to the box representing the unit of edges and offset,
but this only postpones the unit issue to the next quantity to
be included in the specification.

Datasets cannot be attached to datasets, so there is a restriction as
to which quantities can be converted from an attribute to a dataset
to attach a unit. An attribute attached to a dataset will never be
able to carry a unit, which is unsatisfactory.

> The trouble starts when the attribute itself is dimensionful. The
> example in the HDF5 manual
> http://www.hdfgroup.org/HDF5/doc/UG/13_Attributes.html
> skips this issue and simply assumes that temperature is in
> centigrades and pressure in atmospheres!? The problem has been
> recognised, see e.g.,
> http://lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2009-March/000439.html,
> but until now there appears to be no nice solution.
> 
> The solution with compound types is quite cumbersome, I would like
> to see h5py code implementing this (the h5py manual says almost
> nothing about compounds):
> http://hdf-forum.184993.n3.nabble.com/attribute-units-td1526251.html
> 
> Compound types are not easy to use (and maybe not fully supported by
> all top-level APIs). They may prevent users from using such
> attributes at all, and eventually, people will continue to assume
> rather than to specify the unit of a dataset. In my opinion,
> compound attributes are clearly not a solution.

I would not discard compound types that quickly.

Ideally, the data type of an element (whether in a dataset or an
attribute) should reflect its unit. So instead of attaching the
unit as a string, one defines a unique data type for that unit,
and uses this type for all quantities of this unit.

If I am not mistaken, h5py supports compound data types out of the
box through numpy, e.g., see this (resolved) issue [1].

[1] https://code.google.com/p/h5py/issues/detail?id=144

> A global solution would be to add unit attributes to the h5md group
> for each kind of (the 7 basic) dimension: length, time, mass,
> temperature, electric current, amount of substance, luminous
> intensity:
> http://en.wikipedia.org/wiki/Si_units#Base_units
> But this seems to be very restrictive and it requires some physics
> knowledge to reconstruct the derived units. Further, it would break
> modularity as datasets would not be completely independent of each
> other.

One may also want to use units with different SI prefixes, e.g.,
"meter" and "nanometer" in the same file, so a catalog of units
without any links between the unit and the data is not generally
useful.

> In conclusion, I tend to Peter's solution #1: we leave the optional
> unit attribute as it is, but state in the general section that
> dimensionful, time-independent data have to be stored as datasets
> _if_ the unit attribute is needed. In order to avoid too many
> possiblities for the box, we may attach the "unit" attribute to the
> "box" group itself rather than to edges/offset.

I would very much like to go with solution #0: Move the unit attribute
to the discussion section, and properly implement H5MD units later.

Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]