h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] box data as part of trajectory/position


From: Pierre de Buyl
Subject: Re: [h5md-user] box data as part of trajectory/position
Date: Mon, 17 Sep 2012 15:18:24 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Felix,

After reading you I tend to prefer mostly your implementation, see details
below :-)

On Wed, Sep 12, 2012 at 10:53:41AM +0200, Felix Höfling wrote:
> Hi Pierre,
> 
> Am 12.09.2012, 09:52 Uhr, schrieb Pierre de Buyl
> <address@hidden>:
> 
> >Hi Felix,
> >
> >On Mon, Sep 10, 2012 at 09:14:55AM +0200, Felix Höfling wrote:
> >
> >>I thought about the box again since I feel not really
> >>comfortable with the
> >>current specification. I find it a bit awkward that the
> >>observables group
> >>must be present if a file contains trajectory data only.
> >>Further, the box
> >>information is only needed in conjuction with position data. If only
> >>velocities are stored (for some reason), the box is not needed. And the
> >>maybe strongest point last: for time-dependent boxes, there shall be a
> >>simple way to retrieve the corresponding box size for a given
> >>entry in the
> >>position time series. (Currently, the box may be stored at different
> >>intervals than the positions).
> >>
> >>My suggestion is to link the box much tighter to the position data. The
> >>box group in observables may still be present and can be realised by
> >>appropriate hard links. The following suggestion ensures that
> >>the box data
> >>are available within each position group consistently using the
> >>same time
> >>grid as the position data:
> >>
> >>trajectory
> >>   \-- group1
> >...
> >>
> >>One open point: how can we efficiently store the information for a fixed
> >>box size (which is a pretty widespread case)? If the edges and offset
> >>datasets contain always the same entries, they may pack well, but they
> >>have to be unpacked for accessing any data point. An alternative
> >>would be
> >>to indicate the non-changing box size transparently, e.g., by an
> >>additional attribute and different dataset extents (with fixed size).
> >>
> >>trajectory
> >>   \-- group1
> >>   |  \-- position
> >>   |    |    \-- value
> >>   |    |    \-- step
> >>   |    |    \-- time
> >>   |    \-- box
> >>   |         +-- type
> >>   |         \-- edges [D][D]
> >>   |         \-- offset [D]
> >>
> >>(Note that the extents of edges depend on the box type, either [D]
> >>or [D][D].)
> >
> >I prefer to turn your suggestion around, if you don't mind: keep
> >the data in
> >observables, with the option to link from the trajectory groups if
> >needed.
> >
> >The thing that I think you would like to avoid is to carry
> >"observables" even
> >though all you want is a trajectory (with box information indeed).
> >On the other
> >hand, if one wants to find the box information, it is in
> >"/trajectory/groupname/..." where "groupname" depends on the
> >file... Even if the
> >data is linked, this seems more cumbersome to me. The
> >specification of several
> >boxes seems to me to be a more of an exceptional event.
> >
> 
> My suggestions is less cumbersome than you describe. The box is
> mostly relevant for the interpretation of position data, and then
> all information is contained in "/trajectory/group/position" without
> resorting to a different root group. The position data is
> exceptional in this respect due to the typically used periodic
> boundaries.
> 
> If the box information itself is needed, I agree. It should not be
> deduced from some trajectory group. Therefore I suggested to keep it
> in observables as it is. But not every information needs to be
> stored. If the box is not in observables, a H5MD reader refuses
> retrieving it from the file (although it could by looking up some
> strange trajectory group).

A conclusion would be that "box" is mandatory within a trajectory group with
identical "step" and "time" datasets (that may or not be linked, this is an
implementation detail). Then, as you write, wether or not it is also found in
"observables" depends on the simulation.

> >Please consider the following example as a reason to keep that data in
> >observables. In the case of a varying volume simulation, one may
> >want to keep
> >only the thermodynamical observables: energy, temperature, ...,
> >box size. That
> >is: all "order 1 in storage" information as opposed to "order N"
> >information
> >(particle information).
> >
> >Finally, your scheme is compatible with the current draft as
> >"additional data"
> >is not illegal for H5MD, while the reverse would not be true
> >(missing data in
> >observables).
> >
> 
> I would like to make /observables/box and thus /observables
> non-mandatory. At the same time, my suggestion makes the box
> information mandatory if position data are present (but stored in
> trajectory/group/position).

See suggestion above, is it fitting?

> So far, the only mandatory root group should be /h5md. I though
> about providing the space dimension explicitly as attribute in
> /parameters (or /h5md). It is cumbersome to deduce it from data set
> extensions of, e.g., box/offset.

I am not very enthusiastic about storing the dimensions explicitly, it seems
redundant. May we leave that discussion after we have feedback from other users?

> For your application, all you need to do is providing links from
> observables/box to the position data. On the writer's side, this is
> not much overhead, while the reader has to access only a single
> subgroup (.../position) and file format itself becomes more
> flexible.
> 
> >As far as the time correspondance is concerned, in my mind this
> >could be done
> >as: the box information is stored only when it changes so that
> >what would be
> >looking for is the maximum time in "/observables/box/edges/time"
> >that is lower
> >than or equal to the requested time. That or require that to each
> >timestep in
> >the trajectory matches one in the box information.
> >
> I have concerns that "than or equal to a time/step" can be
> implemented efficiently. For example, how would you do so using
> h5py? numpy.where is an option, but inefficient (it requires the
> whole time series of box to be read in, the comparison is done for
> each access to a position item).
> 
> My suggestion works by indexing, which is simple and highly efficient.

I work with the assumption that the list is ordered and then do a bisection.
Anyway, imposing equal step and time makes this discussion irrelevant.

> >Now, for the fixed in time issue. From the current draft:
> >"""
> >For all box kinds, if the data for edges,offset is stored as a
> >single dataset,
> >it is considered fixed in time. Else, it should comply to the
> >step, time and
> >value organization.
> >"""
> >I think that this is good. It is simple to parse and does not
> >involve extra
> >attributes.
> >
> I overlooked this passage. Am I correct when reading it as either
> for the static case
> 
> observables
>  \-- box
>       +-- type
>       \-- edges [D]
>       \-- offset [D]
> 
> or as for the fluctuating box:
> 
> observables
>  \-- box
>       +-- type
>       \-- edges
>            \-- step [var]
>            \-- time [var]
>            \-- value [var][D]
>       \-- offset
>            \-- step [var]
>            \-- time [var]
>            \-- value [var][D]
> 
> Shall we make the static case explicit in the draft as well?

Yes, this is a good idea.

If you agree with my suggestion, I'll update the draft accordingly.

Best,

Pierre




reply via email to

[Prev in Thread] Current Thread [Next in Thread]