[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Quotes being stripped by "--csv"
From: |
Neil R. Ormos |
Subject: |
Re: Quotes being stripped by "--csv" |
Date: |
Sun, 26 Nov 2023 12:58:40 -0600 (CST) |
User-agent: |
Alpine 2.20 (DEB 67 2015-01-07) |
Ed Morton wrote:
> [...] but not all CSV-processing applications require
> modifying fields and not all applications that do modify
> fields are allowed to produce output with different quotes
> than the input had even if they have to strip those quotes
> temporarily while modifying the fields.
> I get CSVs from multiple sources and need to
> compare/manipulate them and return them to those sources
> or send to other destinations that would otherwise receive
> the original exported CSV. Some of those CSVs are exported
> from Excel or other Windows tools, some are exported from
> various applications that run on various web sites, some
> are created by various Unix tools that have evolved over
> the years. I see various quoting styles/rules applied
> across those CSVs - quote only when needed, quote all
> fields, quote all strings but do not quote numbers, quote
> only specific columns, quote the data rows but not the
> header row, etc., etc. [...]
> [...] but people have been writing tools to parse various
> subsets of CSVs with various subsets of allowed/required
> quoting for 50+ years and CSVs are used in many varied
> applications with no 1 common standard they all follow,
> despite the existence of RFC4180, so I expect I'm not
> alone in having a need for CSV parsing that simply doesn't
> strip quotes.
I've had many use cases that are in a category similar to what Ed describes.
The producer or the ultimate consumer of the CSV file exhibits idiosyncratic
CSV-handling behavior, that behavior cannot be changed, the full extent of the
idiosyncrasy is unknown or tedious to duplicate, and the practical requirement
is that the output from awk shall be identical to the input except for specific
intended changes.
The optional behavior Ed requested, where the fields of CSV input records are
separated but otherwise unmolested, would simplify handling of these use cases,
e.g., in one-liners and similar small-scale scripts. Because I have been
processing CSV files long before the --csv option was added, I already have
ways of dealing with these situations. But each user newly confronted with
these use cases would have to analyze the problem and craft a new solution.
The optional non-stripping --csv behavior would avoid that duplicative effort
for many potential users, while advantageously providing a standard,
easy-to-use facility that exhibits performance and behavior consistent with
gawk's conventional field and record processing.
I do not seek to relitigate Arnold's decision; he has to weigh and reconcile
many competing considerations, of which utility is but one. This post is
offered simply to balance the record in view of doubts expressed earlier as to
whether the requested optional --csv behavior would be "useful". Although Ed
took the initiative to make and advocate for the request, he is not the only
one confronted with this category of CSV-handling problem, and the optional
behavior Ed requested would indeed be useful to others.
- Re: Quotes being stripped by "--csv", (continued)
- Re: Quotes being stripped by "--csv", Ed Morton, 2023/11/19
- Re: Quotes being stripped by "--csv", Ben Hoyt, 2023/11/19
- Re: Quotes being stripped by "--csv", Ed Morton, 2023/11/23
- Re: Quotes being stripped by "--csv", Manuel Collado, 2023/11/24
- Re: Quotes being stripped by "--csv", arnold, 2023/11/24
- Re: Quotes being stripped by "--csv", Manuel Collado, 2023/11/24
- Re: Quotes being stripped by "--csv", Ed Morton, 2023/11/24
- Re: Quotes being stripped by "--csv", Ed Morton, 2023/11/24
- Re: Quotes being stripped by "--csv", Ben Hoyt, 2023/11/25
- Re: Quotes being stripped by "--csv", Ed Morton, 2023/11/26
- Re: Quotes being stripped by "--csv",
Neil R. Ormos <=
Re: Quotes being stripped by "--csv", arnold, 2023/11/23
Re: Quotes being stripped by "--csv", J Naman, 2023/11/27