[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] vnlog support
From: |
Dima Kogan |
Subject: |
Re: [PATCH] vnlog support |
Date: |
Sun, 15 May 2022 14:08:17 -0700 |
User-agent: |
mu4e 1.6.10; emacs 29.0.50 |
Thanks, Erik.
Thanks for the notes. I'm extending the patch, and writing tests. Some
quick replies inline.
Erik Auerswald <auerswal@unix-ag.uni-kl.de> writes:
> 1. While GNU datamash, when given the option -C, --skip-comments,
> recognizes lines where the first non-whitespace character is
> either '#' or ';' as comments, the vnlog format does not treat
> ';' as starting a comment.
Right. I fixed that, and wrote some tests. A few more related bugs to
fix.
> 2. The patches do not add any special treatment of '-' to GNU
> datamash, but '-' does have a special meaning in vnlog.
I think it's fine for sum() to complain when it encounters a -. Maybe
count() shouldn't count - entries, or something. I can't think of any
other operations that definitely need to be handled specially, but I
haven't thought about it very hard yet.
> 3. The patches seem to create a vnlog mode where both input and
> output are in vnlog format. Could it be useful to be able to
> specify vnlog format separately for input and output?
You tell me. *I* am only ever going to use this this in full vnlog mode.
But if we want a separate --vnlog-in and --vnlog-out and -V, we can do
that.
> 4. If one would consider creating vnlog output from character
> separated input data via GNU datamash, empty fields would
> need to be replaced with '-'. While GNU datamash has some
> support for missing values via the --no-strict and --filler=X
> options, this does not seem to replace empty fields with the
> specified filler, and missing fields seem to be replaced only
> sometimes, e.g., with the "transpose" operation, but not the
> "reverse" operation. Would it be useful to add optionally
> generating '-' fields?
Similarly to the last point, this would be needed if we're supporting
reading non-vnlog, and writing vnlog. Do we want that? I think IF we
want that, then we should add some sort of pass-through operation that
does nothing but reformat the data.
> 5. Would it make sense to add the functionality required for
> vnlog format support via separate options? There could be a
> --vnlog option that sets all those correctly and then adds
> the vnlog specific prologue handling.
>
> Perhaps the functionality could be added using variables that
> could be controlled via options, without adding all those
> controlling options immediately.
>
> - There is already a -W, --whitespace option.
> - There is already an --output-delimiter option.
> - There is already a -C, --skip-comments option.
>
> - There could be a new option to specify the comment
> character.
> - There could be a new option to treat some value, e.g., the
> filler value, as representing an empty field.
> - There could be a new option to replace empty and missing
> fields in the output with the filler value.
> - There could be a new option to add a prefix to the output
> header line.
> - There could be a new option to read the input header line
> from a vnlog prologue.
This is a question for longer term datamash development. Do we want to
support hybrid formats? Unless we're sure that the answer is "yes", I
think we should implement vnlog as is right now, and add the extra
flexibility later, when/if we decide we want it.
Re: [PATCH] vnlog support, Erik Auerswald, 2022/05/21