bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Column labels with --full


From: Erik Auerswald
Subject: Re: Column labels with --full
Date: Mon, 16 May 2022 08:47:40 +0200

Hi,

On Sun, May 15, 2022 at 01:50:46PM -0700, Dima Kogan wrote:
> Hi. In writing tests for the vnlog support I stumbled on what looks like
> a bug. Or at the very least an odd behavior. There's even a test for it,
> but the "reference" output we compare against looks wrong.
> 
> I'm looking at the hdr4 and hdr5 tests:
> 
>   
> https://git.savannah.gnu.org/cgit/datamash.git/tree/tests/datamash-tests.pl?h=v1.7#n541
> 
> Both process this input data:
> 
>   A 3 W
>   A 5 W
>   A 7 W
>   A 11 X
>   A 13 X
>   B 17 Y
>   B 19 Z
>   C 23 Z
> 
> Test hdr4 runs essentially "datamash --group 1 --header-out count 2"
> with this expected result:
> 
>   GroupBy(field-1) count(field-2)
>   A 5
>   B 2
>   C 1
> 
> Which is fine. The next test (hdr5) does the same thing, but adds
> "--full". I would expect the same output as in hdr4, but with the extra
> columns we just threw out being re-added. This isn't entirely what
> happens:
> 
>   field-1 field-2 field-3 count(field-2)
>   A 3 W 5
>   B 17 Y 2
>   C 23 Z 1
> 
> Note that the field label for our grouping key changed from
> "GroupBy(field-1)" to "field-1". I don't think this is right. Opinions?

I also think that giving the original field labels, but then omitting most
of the lines because of grouping and counting, is not ideal.  The label
for the grouped by field(s) should probably be GroupBy(label), and the
other fields should probably also have some indicator that values have
been omitted, e.g., First(label).

Thanks,
Erik



reply via email to

[Prev in Thread] Current Thread [Next in Thread]