h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[h5md-user] Writing vs reading


From: Konrad Hinsen
Subject: [h5md-user] Writing vs reading
Date: Mon, 13 Jan 2014 15:48:53 +0100

Olaf Lenz writes:

 > Ultimately, the problem that seems to reoccur in many of the discussions is 
 > the
 > question who will have to do the most effort: the writer of h5md, or the 
 > reader?

That's indeed an important aspect, together with the related one of
who is responsible for verifying that the rules are respected: the
writer, the reader, an intermediate instance (a validation program),
or nobody (i.e. anarchy).

Many informally defined data formats (and that means practically
everything used in science) are based on anarchy: the standard is a
statement of intention, to which every program conforms as much as its
authors consider useful, leading to many "dialects". While better than
no standard, a standard with an anarchy attitude usually causes lots
of frustrations among users.

A formally defined standard (i.e. XML formats with a DTD or schema)
provides a way to validate the correctness of a data file, even though
this validation rarely covers all possible non-conformities. The
existence of a "neutral arbiter" (the validation tool) encourages
writers to respect the standard and readers not to accept invalid
files. In the long run this works a lot better.

There is a huge gray zone in between these two extremes, and that's
where I think H5MD belongs. Certain "hard core" features should be
respected strictly, whereas less central features should be soft
constraints open to extension and reinterpretation.

 > 1. Reader-friendly
 > In a "reader-friendly" approach, we would specify exactly how positions in 
 > periodic
 > boundary conditions have to be stored in h5md, e.g. "image" has to always 
 > exist,
 > and "position" always has to be within the primary box. This makes reading 
 > the
 > positions from a h5md file simpler.
 > However, it comes at the cost that the writer of the file will have to 
 > prepare the
 > data exactly as h5md specifies it.
 > 
 > 2. Writer-friendly
 > In a "writer-friendly" approach, we would allow any possible case how to 
 > store the
 > positions as long as it is unique (with image, without image, inside the 
 > primary
 > box, outside the primary box, whatever).
 > This comes at the cost that reading the file is more complex.

The most important feature for me is that any given combination of
data arrays has a clear and unambiguous meaning. That criterion still
leaves a lot of freedom, where as you say the question is whose life
we want to simplify most.

My personal preference would be for the simplest rules for data
interpretation.  That's close to your "reader-friendly" but not
exactly the same. It's neither simplicity of reader implementation nor
minimization of operations in the reader that matters for me, but the
simplicity of the rules that the reader has to apply. The goal is to
keep readers and writers easy to understand for humans. I do realize
that this criterion does not necessarily lead to a unique best
solution of course.

 >   * When we go the reader-friendly way, we will not be able to prevent 
 > people to
 >     write files that do not conform to the specs anyway, so a good reader 
 > will
 >     either have to throw an error in that case, or he will have to handle it.

For the reasons stated above, readers should be encouraged to throw
errors when presented with invalid files.

 >   * When we choose the writer-friendly way, we can not guarantee that all 
 > readers
 >     can actually handle all possible cases. Library functions that support a 
 > reader
 >     may help, but it will not be possible to cover all possible cases in 
 > such a
 >     function.

That's where a validation tool comes in handy: a reader that fails to read
a file that passes validation is considered buggy.

 >   * I would expect that more people will program tools that read
 >     h5md files than people that program tools to write h5md
 >     files. Furthermore, people that create such files are probably
 >     more used to stick to specs than people that read them.
 >     Insofar it might make sense to make h5md reader-friendly rather
 >     than writer-friendly.

That's indeed a good pragmatic principle.

Konrad
-- 
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: research AT khinsen DOT fastmail DOT net
http://dirac.cnrs-orleans.fr/~hinsen/
ORCID: http://orcid.org/0000-0003-0330-9428
Twitter: @khinsen
---------------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]