h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] box data as part of trajectory/position


From: Felix Höfling
Subject: Re: [h5md-user] box data as part of trajectory/position
Date: Wed, 12 Sep 2012 10:53:41 +0200
User-agent: Opera Mail/12.01 (Linux)

Hi Pierre,

Am 12.09.2012, 09:52 Uhr, schrieb Pierre de Buyl
<address@hidden>:

Hi Felix,

On Mon, Sep 10, 2012 at 09:14:55AM +0200, Felix Höfling wrote:

I thought about the box again since I feel not really comfortable with the current specification. I find it a bit awkward that the observables group must be present if a file contains trajectory data only. Further, the box
information is only needed in conjuction with position data. If only
velocities are stored (for some reason), the box is not needed. And the
maybe strongest point last: for time-dependent boxes, there shall be a
simple way to retrieve the corresponding box size for a given entry in the
position time series. (Currently, the box may be stored at different
intervals than the positions).

My suggestion is to link the box much tighter to the position data. The
box group in observables may still be present and can be realised by
appropriate hard links. The following suggestion ensures that the box data are available within each position group consistently using the same time
grid as the position data:

trajectory
   \-- group1
...

One open point: how can we efficiently store the information for a fixed
box size (which is a pretty widespread case)? If the edges and offset
datasets contain always the same entries, they may pack well, but they
have to be unpacked for accessing any data point. An alternative would be
to indicate the non-changing box size transparently, e.g., by an
additional attribute and different dataset extents (with fixed size).

trajectory
   \-- group1
   |  \-- position
   |    |    \-- value
   |    |    \-- step
   |    |    \-- time
   |    \-- box
   |         +-- type
   |         \-- edges [D][D]
   |         \-- offset [D]

(Note that the extents of edges depend on the box type, either [D]
or [D][D].)

I prefer to turn your suggestion around, if you don't mind: keep the data in observables, with the option to link from the trajectory groups if needed.

The thing that I think you would like to avoid is to carry "observables" even though all you want is a trajectory (with box information indeed). On the other
hand, if one wants to find the box information, it is in
"/trajectory/groupname/..." where "groupname" depends on the file... Even if the data is linked, this seems more cumbersome to me. The specification of several
boxes seems to me to be a more of an exceptional event.


My suggestions is less cumbersome than you describe. The box is mostly
relevant for the interpretation of position data, and then all information
is contained in "/trajectory/group/position" without resorting to a
different root group. The position data is exceptional in this respect due
to the typically used periodic boundaries.

If the box information itself is needed, I agree. It should not be deduced
  from some trajectory group. Therefore I suggested to keep it in
observables as it is. But not every information needs to be stored. If the
box is not in observables, a H5MD reader refuses retrieving it from the
file (although it could by looking up some strange trajectory group).

Please consider the following example as a reason to keep that data in
observables. In the case of a varying volume simulation, one may want to keep only the thermodynamical observables: energy, temperature, ..., box size. That is: all "order 1 in storage" information as opposed to "order N" information
(particle information).

Finally, your scheme is compatible with the current draft as "additional data" is not illegal for H5MD, while the reverse would not be true (missing data in
observables).


I would like to make /observables/box and thus /observables non-mandatory.
At the same time, my suggestion makes the box information mandatory if
position data are present (but stored in trajectory/group/position).

So far, the only mandatory root group should be /h5md. I though about
providing the space dimension explicitly as attribute in /parameters (or
/h5md). It is cumbersome to deduce it from data set extensions of, e.g.,
box/offset.

For your application, all you need to do is providing links from
observables/box to the position data. On the writer's side, this is not
much overhead, while the reader has to access only a single subgroup
(.../position) and file format itself becomes more flexible.

As far as the time correspondance is concerned, in my mind this could be done as: the box information is stored only when it changes so that what would be looking for is the maximum time in "/observables/box/edges/time" that is lower than or equal to the requested time. That or require that to each timestep in
the trajectory matches one in the box information.

I have concerns that "than or equal to a time/step" can be implemented
efficiently. For example, how would you do so using h5py? numpy.where is
an option, but inefficient (it requires the whole time series of box to be
read in, the comparison is done for each access to a position item).

My suggestion works by indexing, which is simple and highly efficient.

Now, for the fixed in time issue. From the current draft:
"""
For all box kinds, if the data for edges,offset is stored as a single dataset, it is considered fixed in time. Else, it should comply to the step, time and
value organization.
"""
I think that this is good. It is simple to parse and does not involve extra
attributes.

I overlooked this passage. Am I correct when reading it as either for the
static case

observables
   \-- box
        +-- type
        \-- edges [D]
        \-- offset [D]

or as for the fluctuating box:

observables
   \-- box
        +-- type
        \-- edges
             \-- step [var]
             \-- time [var]
             \-- value [var][D]
        \-- offset
             \-- step [var]
             \-- time [var]
             \-- value [var][D]

Shall we make the static case explicit in the draft as well?

Cheers,
Felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]