[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] box data as part of trajectory/position

From: Pierre de Buyl
Subject: Re: [h5md-user] box data as part of trajectory/position
Date: Wed, 12 Sep 2012 09:52:14 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Felix,

On Mon, Sep 10, 2012 at 09:14:55AM +0200, Felix Höfling wrote:
> Hi H5MD users,
> [I realised that I sent this post to the wrong thread, where it
> clearly was off topic. Since there was no response so far, I will
> try again with a new thread.]

I am just late for everything because of many conferences, including EuroSciPy

> I thought about the box again since I feel not really comfortable with the
> current specification. I find it a bit awkward that the observables group
> must be present if a file contains trajectory data only. Further, the box
> information is only needed in conjuction with position data. If only
> velocities are stored (for some reason), the box is not needed. And the
> maybe strongest point last: for time-dependent boxes, there shall be a
> simple way to retrieve the corresponding box size for a given entry in the
> position time series. (Currently, the box may be stored at different
> intervals than the positions).
> My suggestion is to link the box much tighter to the position data. The
> box group in observables may still be present and can be realised by
> appropriate hard links. The following suggestion ensures that the box data
> are available within each position group consistently using the same time
> grid as the position data:
> trajectory
>    \-- group1
> One open point: how can we efficiently store the information for a fixed
> box size (which is a pretty widespread case)? If the edges and offset
> datasets contain always the same entries, they may pack well, but they
> have to be unpacked for accessing any data point. An alternative would be
> to indicate the non-changing box size transparently, e.g., by an
> additional attribute and different dataset extents (with fixed size).
> trajectory
>    \-- group1
>    |  \-- position
>    |    |    \-- value
>    |    |    \-- step
>    |    |    \-- time
>    |    \-- box
>    |         +-- type
>    |         \-- edges [D][D]
>    |         \-- offset [D]
> (Note that the extents of edges depend on the box type, either [D]
> or [D][D].)

I prefer to turn your suggestion around, if you don't mind: keep the data in
observables, with the option to link from the trajectory groups if needed.

The thing that I think you would like to avoid is to carry "observables" even
though all you want is a trajectory (with box information indeed). On the other
hand, if one wants to find the box information, it is in
"/trajectory/groupname/..." where "groupname" depends on the file... Even if the
data is linked, this seems more cumbersome to me. The specification of several
boxes seems to me to be a more of an exceptional event.

Please consider the following example as a reason to keep that data in
observables. In the case of a varying volume simulation, one may want to keep
only the thermodynamical observables: energy, temperature, ..., box size. That
is: all "order 1 in storage" information as opposed to "order N" information
(particle information).

Finally, your scheme is compatible with the current draft as "additional data"
is not illegal for H5MD, while the reverse would not be true (missing data in

As far as the time correspondance is concerned, in my mind this could be done
as: the box information is stored only when it changes so that what would be
looking for is the maximum time in "/observables/box/edges/time" that is lower
than or equal to the requested time. That or require that to each timestep in
the trajectory matches one in the box information.

Now, for the fixed in time issue. From the current draft:
For all box kinds, if the data for edges,offset is stored as a single dataset,
it is considered fixed in time. Else, it should comply to the step, time and
value organization.
I think that this is good. It is simple to parse and does not involve extra



reply via email to

[Prev in Thread] Current Thread [Next in Thread]