Re: [h5md-user] box data as part of trajectory/position

h5md-user

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] box data as part of trajectory/position

From:	Felix Höfling
Subject:	Re: [h5md-user] box data as part of trajectory/position
Date:	Wed, 12 Sep 2012 10:53:41 +0200
User-agent:	Opera Mail/12.01 (Linux)

Hi Pierre,

Am 12.09.2012, 09:52 Uhr, schrieb Pierre de Buyl
<address@hidden>:

Hi Felix,

On Mon, Sep 10, 2012 at 09:14:55AM +0200, Felix Höfling wrote:
I thought about the box again since I feel not really comfortable withthecurrent specification. I find it a bit awkward that the observablesgroupmust be present if a file contains trajectory data only. Further, thebox
information is only needed in conjuction with position data. If only
velocities are stored (for some reason), the box is not needed. And the
maybe strongest point last: for time-dependent boxes, there shall be a
simple way to retrieve the corresponding box size for a given entry inthe
position time series. (Currently, the box may be stored at different
intervals than the positions).

My suggestion is to link the box much tighter to the position data. The
box group in observables may still be present and can be realised by
appropriate hard links. The following suggestion ensures that the boxdataare available within each position group consistently using the sametime
grid as the position data:

trajectory
   \-- group1
...
One open point: how can we efficiently store the information for a fixed
box size (which is a pretty widespread case)? If the edges and offset
datasets contain always the same entries, they may pack well, but they
have to be unpacked for accessing any data point. An alternative wouldbe
to indicate the non-changing box size transparently, e.g., by an
additional attribute and different dataset extents (with fixed size).

trajectory
   \-- group1
   |  \-- position
   |    |    \-- value
   |    |    \-- step
   |    |    \-- time
   |    \-- box
   |         +-- type
   |         \-- edges [D][D]
   |         \-- offset [D]

(Note that the extents of edges depend on the box type, either [D]
or [D][D].)
I prefer to turn your suggestion around, if you don't mind: keep thedata inobservables, with the option to link from the trajectory groups ifneeded.
The thing that I think you would like to avoid is to carry "observables"eventhough all you want is a trajectory (with box information indeed). Onthe other
hand, if one wants to find the box information, it is in
"/trajectory/groupname/..." where "groupname" depends on the file...Even if thedata is linked, this seems more cumbersome to me. The specification ofseveral
boxes seems to me to be a more of an exceptional event.


My suggestions is less cumbersome than you describe. The box is mostly
relevant for the interpretation of position data, and then all information
is contained in "/trajectory/group/position" without resorting to a
different root group. The position data is exceptional in this respect due
to the typically used periodic boundaries.

If the box information itself is needed, I agree. It should not be deduced
  from some trajectory group. Therefore I suggested to keep it in
observables as it is. But not every information needs to be stored. If the
box is not in observables, a H5MD reader refuses retrieving it from the
file (although it could by looking up some strange trajectory group).

Please consider the following example as a reason to keep that data in
observables. In the case of a varying volume simulation, one may want tokeeponly the thermodynamical observables: energy, temperature, ..., boxsize. Thatis: all "order 1 in storage" information as opposed to "order N"information
(particle information).
Finally, your scheme is compatible with the current draft as "additionaldata"is not illegal for H5MD, while the reverse would not be true (missingdata in
observables).


I would like to make /observables/box and thus /observables non-mandatory.
At the same time, my suggestion makes the box information mandatory if
position data are present (but stored in trajectory/group/position).

So far, the only mandatory root group should be /h5md. I though about
providing the space dimension explicitly as attribute in /parameters (or
/h5md). It is cumbersome to deduce it from data set extensions of, e.g.,
box/offset.

For your application, all you need to do is providing links from
observables/box to the position data. On the writer's side, this is not
much overhead, while the reader has to access only a single subgroup
(.../position) and file format itself becomes more flexible.

As far as the time correspondance is concerned, in my mind this could bedoneas: the box information is stored only when it changes so that whatwould belooking for is the maximum time in "/observables/box/edges/time" that islowerthan or equal to the requested time. That or require that to eachtimestep in
the trajectory matches one in the box information.

I have concerns that "than or equal to a time/step" can be implemented
efficiently. For example, how would you do so using h5py? numpy.where is
an option, but inefficient (it requires the whole time series of box to be
read in, the comparison is done for each access to a position item).

My suggestion works by indexing, which is simple and highly efficient.

Now, for the fixed in time issue. From the current draft:
"""
For all box kinds, if the data for edges,offset is stored as a singledataset,it is considered fixed in time. Else, it should comply to the step, timeand
value organization.
"""
I think that this is good. It is simple to parse and does not involveextra
attributes.

I overlooked this passage. Am I correct when reading it as either for the
static case

observables
   \-- box
        +-- type
        \-- edges [D]
        \-- offset [D]

or as for the fluctuating box:

observables
   \-- box
        +-- type
        \-- edges
             \-- step [var]
             \-- time [var]
             \-- value [var][D]
        \-- offset
             \-- step [var]
             \-- time [var]
             \-- value [var][D]

Shall we make the static case explicit in the draft as well?

Cheers,
Felix

[Prev in Thread]

Current Thread

[Next in Thread]

[h5md-user] box data as part of trajectory/position, Felix Höfling, 2012/09/10
- Re: [h5md-user] box data as part of trajectory/position, Pierre de Buyl, 2012/09/12
  - Re: [h5md-user] box data as part of trajectory/position, Felix Höfling <=
    - Re: [h5md-user] box data as part of trajectory/position, Pierre de Buyl, 2012/09/17
    - Re: [h5md-user] box data as part of trajectory/position, Felix Höfling, 2012/09/17
    - Re: [h5md-user] box data as part of trajectory/position, Pierre de Buyl, 2012/09/18
    - Re: [h5md-user] box data as part of trajectory/position, Felix Höfling, 2012/09/18

Prev by Date: Re: [h5md-user] box data as part of trajectory/position
Next by Date: Re: [h5md-user] box data as part of trajectory/position
Previous by thread: Re: [h5md-user] box data as part of trajectory/position
Next by thread: Re: [h5md-user] box data as part of trajectory/position
Index(es):
- Date
- Thread