h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] units module


From: Felix Höfling
Subject: Re: [h5md-user] units module
Date: Thu, 31 Oct 2013 17:17:18 +0100
User-agent: Opera Mail/12.15 (Linux)

Hi all,

I made an effort to write down a specification for the units module to make progress. I took up Pierre's suggestion and added a list of units inspired by Mosaic and udunits2.

I hope that a concrete module serves as a basis for the further discussion and will help us to come to a conclusion on the units soon. Once this is settled, version 1 of H5MD seems not far anymore ...

Best regards,

Felix


Am 18.10.2013, 09:47 Uhr, schrieb Pierre de Buyl <address@hidden>:

On Thu, Oct 17, 2013 at 03:16:40PM +0200, Konrad Hinsen wrote:
Pierre de Buyl writes:
 > So, to get back to Peter's message:
 >
> I propose that we follow udunits grammar by restricting it similarly to Mosaic.
 > For reference, Mosaic's definition is
 > """
> The value of the units field is a text string in ASCII encoding. It contains a > sequence of unit factors separated by a space. A unit factor is a unit symbol > optionally followed by a non-zero integer which indicates the power to which
 > this factor is taken.
 > """
 >
 > I would remove the constants defined ("c" and "Nav"), however.

The current unit list is a first draft, to be revised before version
1.0 of Mosaic. You are completely right about "Nav", which is the same
as "mol" and thus redundant. However, "c" frequently occurs in derived
unit, e.g. "cm-1 c" for frequency, which is heavily used in
spectroscopy.

In any case, this kind of constant would go in a module and not in the base
spec.

 > We may want to add "a unit string must be parseable by udunits"?

The problem with that statement is that we don't control udunits.  In
general, it's not a good idea to define a data format by the
capacities of a piece of software. It's fine to have such a comment as
a statement of intention, of course.

Ok.

Felix Höfling writes:

> I find udunits' grouping into SI-base units, SI-derived units etc. very > reasonable. Let's keep it for H5MD rather than introducing a different
 > subset.

That was my original idea for Mosaic, but I changed my mind for the
following reasons:

1) The point of having a restricted set of units is to permit error
   checking. Allowing a unit that is more likely to be a typo than
   a choice is ultimately of no benefit. A general-purpose library
   such as udunits can't limit the allowed units, but a domain-specific
   format such as Mosaic can.

2) The distinction between SI-base and SI-derived is logical for a
   metrologist, but irrelevant for practical use. I don't expect
   SI-base to be sufficient for much of molecular data, if only
   because of the lack of energy units.

3) Fewer units means a reduced risk of errors if automatic conversion
   is attempted (see below).

> Actually, whether a reader can "understand" a small or large set of units
 > is mainly a matter of the database defining the units. Do I overlook
 > something here? Why not copying the full list from udunits?

See 1) above.

Also, to get an idea of what's possible with udunits I had to play a bit.
Providing an explicit list seems simpler.

> BTW, a more advanced functionality that discriminates between "simple" and
 > "advanced" readers is automatic conversion between units ...

Indeed, but conversion is a very tricky business. SI has two traps for
unit converters:

 - Dimensionless units: rad, sr, and mol

Is pi dimensionless or measured in rad? Both make sense, and automatic
   conversion needs to know which convention was used.

   I am actually considering to remove "rad" from the allowed units in
   Mosaic, and make "deg" a dimensionless constant equal to 180/pi.
That's much closer to the reality of unit use in computational chemistry
   than the SI system.

 - Dimensionally equal but incompatible units: 1/s, Hz, Bq

   It's OK to convert Hz and Bq to 1/s, but not among each other.
   Converting 1/s to Hz or Bq is in general not allowed. The problem
   disappears if Hz and Bq are not allowed.

Ok, so we need to settle on what can go into a unit.

(Most of this is copied from Mosaic, which means I should not forget to add a license statement somewhere. BTW, Konrad, do you know if we can include your
CC-BY in our GPL "code"?)

"""
"unit" is a scalar attribute of type variable length string. "unit" consists of a sequence of unit factors separated by a space. A unit factor is either a number (an integer or a decimal fraction) or a unit symbol optionally followed by a non-zero integer which indicates the power to which this factor is taken. A
unit symbol may include a SI-prefix factor.

Examples:

  - "nm3" stands for cubic nanometers

  - "nm ps-1" stands for nanometers per picosecond

  - "60 s" stands for a minute

Each unit symbol may occur only once in the units field. There may also
be at most one numeric factor, which must be the first one.

"unit" may be encoded as ASCII or UTF8.

The list of available symbols, in the case where no "units" module is present,
is XXX.
"""

P



--
Dr Felix Höfling
Research Scientist

Max-Planck-Institut für Intelligente Systeme
(formerly: Max-Planck-Institut für Metallforschung)
Heisenbergstr. 3, 70569 Stuttgart, Germany

Institut für Theoretische Physik IV
Universität Stuttgart
Pfaffenwaldring 57, 70569 Stuttgart, Germany

Phone:  +49 711 689 1938
Fax:    +49 711 689 1922



reply via email to

[Prev in Thread] Current Thread [Next in Thread]