bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CSV extension status


From: Andrew J. Schorr
Subject: Re: CSV extension status
Date: Wed, 19 May 2021 17:09:44 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, May 19, 2021 at 04:54:19PM -0400, Andrew J. Schorr wrote:
> In your example:
>    CSV fragment -> """Hello!"", she said"
>    Clean text -> "Hello!", she said

OK, so I tested with LibreOffice, and confirmed that your CSV fragment
is valid, ugly though it might be. :-) In a saner world, it would say
"'Hello!', she said" instead, but obviously one has no control over what
other people do.

I guess if I really had to write a C parser to do this, I'd probably create a
"clean" version with quotes removed in $0, and I'd have an optional feature to
save the original (ugly CSV fragments) in a global CSVFIELD array, or something
like that. In other words, $n would contain the cleaned-up version with quotes
removed, and CSVFIELD[n] would have the original messy version, and CSVRECORD
could contain the pristine record (if the user requested preservation of the
original ugliness).

Maybe this is what you already did. I have never looked at the gawkextlib csv
library. But the C code still ought to be faster than an awk implementation,
unless we really screwed up the design of the parser API. And it should be
more robust in the sense that it wouldn't break if the CSV input data happened
to contain SUBSEP, which I agree is unlikely, but one never knows...

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]