bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Quotes being stripped by "--csv"


From: Ben Hoyt
Subject: Re: Quotes being stripped by "--csv"
Date: Sun, 26 Nov 2023 16:08:07 +1300

Hi Ed,

It's likely this discussion is moot, given that Arnold said he's not
planning to change Gawk further. However, a few additional thoughts.

> My post is not about input mode vs output mode, it's entirely about input
mode -
> a way to leave the quotes alone or strip them when populating fields,
that is all.
> Output is left entirely up to the user in either case.

Yes, I recognized that's what you were suggesting. I just don't think
that's a very helpful way of operating on CSV fields, because with the
quotes left in you can't really operate on the data -- for example, you
can't fields as numbers or take their sum (the leading quote would get in
the way), and you can't even really treat them as strings without stripping
the quotes (for example, to concatenate a first name field to last name).
In short, the quoted field value would only be usable if you're going to
pass it straight through to the output.

Similarly, the "csv" module in Python and the "encoding/csv" package in Go
(and I presume it's similar in other languages) give you the un-encoded
field value so that you can perform operations on it.

> It is 1 of the 2 possible correct behaviors, and it's the one that I
expect will be most
> useful most of the time.

I suppose it's not helpful to argue over what is "correct" or not, and I
take your point that what you propose is a possible behaviour. However,
I've tried to show above that the field values wouldn't be very useful
without un-encoding the data -- except to pass it directly to the output.
So I definitely disagree with the second part of your statement. Based on
my own usage, I'm very often summing a field or similar, which wouldn't
work with your approach (without further dequoting/decoding).

To generalize, I think most data processing tends to work this way: decode
input, operate on decoded data, encode output.

In any case, I do think Kernighan's choice to have --csv decode the input
so that you can operate on decoded data is the more helpful choice, and
consistent with what other languages do.

-Ben

>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]