[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Fixed incomplete and incorrect treatment of comments and tra
From: |
Erik Auerswald |
Subject: |
Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace |
Date: |
Tue, 17 May 2022 09:04:53 +0200 |
Hi Dima,
On Mon, May 16, 2022 at 09:25:07AM -0700, Dima Kogan wrote:
> Erik Auerswald <auerswal@unix-ag.uni-kl.de> writes:
> > On Sun, May 15, 2022 at 06:06:21PM -0700, Dima Kogan wrote:
> >
> >> Addresses two related issues:
> >>
> >> - Comments that didn't block out a whole line weren't being properly
> >> ignored by
> >> -C. Lines such as 'bar 5#xxx' didn't ignore the '#xxx' as they were
> >> supposed
> >> to
> >
> > I think that would be a new feature. The --help output states:
> >
> > -C, --skip-comments skip comment lines (starting with '#' or ';'
> > and optional whitespace)
> >
> > As far as I understand the documentation, the -C, --skip-comments option
> > was intended to skip complete lines.
>
> Huh. The docs do indeed describe the observed behavior. But this
> behavior isn't how comments work anywhere else, and breaks everybody's
> expectations of how comments should be interpreted.
I beg to differ. It is not obvious to me that simple tabular data has
any notion of comments inside a data row, neither till the end of the data
line nor inside a data field. Comment lines, i.e., a complete line that
does not contain any data, is a simple extension to simple tabular data.
> I think we should take the patch AND we should update the docs.
I'd suggest that a new option should be used to activate such an extended
comment support.
> > Treating any ';' in a line as starting a comment would interfere with
> > using ';' as field separator. But using ';' as field separator is common
> > with simple CSV-like formats when the locale's decimal separator is a ','.
>
> Using the comment character as the field separator shouldn't work. Does
> anybody expect it to?
It does not seem likely that people using a semicolon separated values
format would expect the semicolon to act as a comment character.
They might think that '#' acts as a comment character which could be
used to add comment lines to the data. Thus I think it could be useful
to specify which character(s) shall be interpreted as starting a comment.
The following currently works:
$ printf -- '# shell comment\n1;2;3\n; lisp comment\n4;5;6\n7;8;9\n'
# shell comment
1;2;3
; lisp comment
4;5;6
7;8;9
$ printf -- '# shell comment\n1;2;3\n; lisp comment\n4;5;6\n7;8;9\n' |
datamash -C -t\; sum 1-3
12;15;18
To illustrate why using a semicolon separated value data format can
be useful:
$ echo $LC_NUMERIC
de_DE.UTF-8
$ printf -- '# shell comment\nfirst;second;third\n1,01;2,02;3,03\n; lisp
comment\n4,04;5,05;6,06\n7,07;8,08;9,09\n' | datamash -C -t\; -H sum 1-3
sum(first);sum(second);sum(third)
12,12;15,15;18,18
Extending -C, --skip-comments to interpret both '#' and ';' inside a
data line as starting a comment would break the above use cases.
The -C -t';' combination does not work if any data line starts with an
empty field. A new option to set only '#' as comment character would
make that work.
Kind regards,
Erik
Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Shawn Wagner, 2022/05/18
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Dima Kogan, 2022/05/19
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/20
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Dima Kogan, 2022/05/20
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/20
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/20
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/21
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Tim Rice, 2022/05/27
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Dima Kogan, 2022/05/28
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Tim Rice, 2022/05/28