h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Minor revisions before H5MD v1.0


From: Peter Colberg
Subject: Re: [h5md-user] Minor revisions before H5MD v1.0
Date: Fri, 3 May 2013 10:27:31 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Felix,

On Fri, May 03, 2013 at 04:03:26PM +0200, Felix Höfling wrote:
> Am 02.05.2013, 22:57 Uhr, schrieb Peter Colberg
> <address@hidden>:
> 
> >For the box group, in the fixed-size case, I would recommend to store
> >"edges" and "offset" as attributes. Besides being the “right” way to
> >store small data, this sets a good example for users with regard to
> >their custom metadata.
> >
> This would be consistent indeed.
> 
> I have a bit mixed feelings about "hiding" the actual  data of a
> group as attributes, the issue is most prominent for the /h5md
> group.

I think it depends on the implementation whether attributes are in
plain view or hidden. In fact I prefer small data as attributes
since they are more visible, when doing an "h5dump -A" to display
only structure and metadata.

> On the other hand, attributes are the most efficient way for
> small pieces of information as you noted earlier.
> 
> This ambiguity is already inherent in the HDF5 Manual, see the first
> sentences of Chapter 8.1:
> http://www.hdfgroup.org/HDF5/doc/UG/UG_frame13Attributes.html
> Further down in this chapter, the maximum reasonable size of an
> attribute is given as 64k. So in conclusion, storing the fixed box
> data as attribute would be fine with me.

Should we mention the 64k in a small section on HDF5 implementation?

That could serve as a guideline for future decisions on small (meta)data.

> >The "parameters" group is intended as a program-dependent group.
> >I suggest to remove the "parameters/dimension" attribute, as it
> >is in contrast to the purpose of this group. The dimension can
> >be derived from, e.g., the "edges" attributes or "edges/value"
> >dataset, similar to the number of particles being derived from
> >"position/value" dataset(s).
> >
> I think we should not turn the clock back. There were good reasons
> to include the dimension parameter explicitly, mainly it cannot be
> inferred from scalar datasets in /observables. Recall that the box
> group is not mandatory.

(It was a recent proposal, so it can be reverted without causing trouble.)

I am definitely opposed to defining anything within the “parameters”
group. That group has been, as said, intended as program-specific.

As for the spatial dimension attribute: The idea of self-describing
datasets is one of the main benefits of using a structural format for
storing molecular data. I would rather not violate that principle.

If you need to split up your files, you could store the box group as
well, or add a custom attribute "dimension" to your observables. The
tool h5copy is quite useful for such splitting operations.

Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]