bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vnlog support


From: Dima Kogan
Subject: Re: [PATCH] vnlog support
Date: Sun, 15 May 2022 14:08:17 -0700
User-agent: mu4e 1.6.10; emacs 29.0.50

Thanks, Erik.

Thanks for the notes. I'm extending the patch, and writing tests. Some
quick replies inline.



Erik Auerswald <auerswal@unix-ag.uni-kl.de> writes:


> 1. While GNU datamash, when given the option -C, --skip-comments,
>    recognizes lines where the first non-whitespace character is
>    either '#' or ';' as comments, the vnlog format does not treat
>    ';' as starting a comment.

Right. I fixed that, and wrote some tests. A few more related bugs to
fix.


> 2. The patches do not add any special treatment of '-' to GNU
>    datamash, but '-' does have a special meaning in vnlog.

I think it's fine for sum() to complain when it encounters a -. Maybe
count() shouldn't count - entries, or something. I can't think of any
other operations that definitely need to be handled specially, but I
haven't thought about it very hard yet.


> 3. The patches seem to create a vnlog mode where both input and
>    output are in vnlog format.  Could it be useful to be able to
>    specify vnlog format separately for input and output?

You tell me. *I* am only ever going to use this this in full vnlog mode.
But if we want a separate --vnlog-in and --vnlog-out and -V, we can do
that.


> 4. If one would consider creating vnlog output from character
>    separated input data via GNU datamash, empty fields would
>    need to be replaced with '-'.  While GNU datamash has some
>    support for missing values via the --no-strict and --filler=X
>    options, this does not seem to replace empty fields with the
>    specified filler, and missing fields seem to be replaced only
>    sometimes, e.g., with the "transpose" operation, but not the
>    "reverse" operation.  Would it be useful to add optionally
>    generating '-' fields?

Similarly to the last point, this would be needed if we're supporting
reading non-vnlog, and writing vnlog. Do we want that? I think IF we
want that, then we should add some sort of pass-through operation that
does nothing but reformat the data.


> 5. Would it make sense to add the functionality required for
>    vnlog format support via separate options?  There could be a
>    --vnlog option that sets all those correctly and then adds
>    the vnlog specific prologue handling.
>
>    Perhaps the functionality could be added using variables that
>    could be controlled via options, without adding all those
>    controlling options immediately.
>
>    - There is already a -W, --whitespace option.
>    - There is already an --output-delimiter option.
>    - There is already a -C, --skip-comments option.
>
>    - There could be a new option to specify the comment
>      character.
>    - There could be a new option to treat some value, e.g., the
>      filler value, as representing an empty field.
>    - There could be a new option to replace empty and missing
>      fields in the output with the filler value.
>    - There could be a new option to add a prefix to the output
>      header line.
>    - There could be a new option to read the input header line
>      from a vnlog prologue.

This is a question for longer term datamash development. Do we want to
support hybrid formats? Unless we're sure that the answer is "yes", I
think we should implement vnlog as is right now, and add the extra
flexibility later, when/if we decide we want it.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]