bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: manual section 4.7.1


From: arnold
Subject: Re: manual section 4.7.1
Date: Tue, 04 Apr 2023 08:28:38 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Thank you for the note.

As the documentation notes, FPAT is only a partial solution for dealing
with CSV data.

The --csv option is not yet released, although of course folks can build from
git and use the result if they wish to.

That section of the manual will be rewritten before gawk 5.3.0 is released.

Thanks,

Arnold

cph1968@proton.me wrote:

> the regex fp[2] in section 4.7.1 (below) don't quite cut it if the CSV file 
> records end in both CR and NL [0H0D 0H0A]. I believe this is a common feature 
> of Windows files.
> A simple fix is however to use the gawk --csv option.
>
> ❯ head -n 2 TSCAINV_022023.csv| gawk -f print-fields.awk
> >ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY
> >F = 1 <ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY
> >1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE
> >F = 1 <1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE
>
> note here that the last '>' is first character on the next line.
>
> output using the --csv option:
> ❯ head -n 2 TSCAINV_022023.csv| gawk --csv -f print-fields.awk
> <ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY>
> NF = 10 <ID><CASRN><casregno><UID><EXP><ChemName><DEF><UVCB><FLAG><ACTIVITY>
> <1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE>
> NF = 10 <1><50-00-0><50000><><><Formaldehyde><><><><ACTIVE>
>
> much better :-)
>
> ❯ cat print-fields.awk
> {
>     print "<" $0 ">"
>     printf("NF = %s ", NF)
>     for (i = 1; i <= NF; i++) {
>         printf("<%s>", $i)
>     }
>     print ""
> }
>
>
> from section 4.7.1:
> BEGIN {
>      fp[0] = "([^,]+)|(\"[^\"]+\")"
>      fp[1] = "([^,]*)|(\"[^\"]+\")"
>      fp[2] = "([^,]*)|(\"([^\"]|\"\")+\")"
>      FPAT = fp[fpat+0]
> }
>
>
>
> kind regards,
>
> cph1968
>
> Sent with Proton Mail secure email.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]