Re: Insertion of extra OFS character into output string

help-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Insertion of extra OFS character into output string

From:	H
Subject:	Re: Insertion of extra OFS character into output string
Date:	Tue, 14 Mar 2023 15:09:56 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 03/14/2023 02:41 AM, david kerns wrote:
>
>
> On Mon, Mar 13, 2023 at 5:59 PM H <agents@meddatainc.com 
> <mailto:agents@meddatainc.com>> wrote:
>
>     On March 14, 2023 12:41:16 AM GMT+01:00, "Neil R. Ormos" 
> <ormos-gnulists17@ormos.org <mailto:ormos-gnulists17@ormos.org>> wrote:
>     >H wrote:
>     >
>     >> I am a newcomer to awk and have run into an
>     >> issue I have not figured out yet... My platform
>     >> is CentOS 7 running awk 4.0.2, the default
>     >> version.
>     >
>     >> The following awk statement generates an extra
>     >> tab character between fields 1 and 2, regardless
>     >> of the data in the file:
>     >
>     >> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1;
>     >gsub(/"/, ""); print}' somefile.csv
>     >
>     >> If i change the statement to:
>     >
>     >> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$2=$2;
>     >gsub(/"/, ""); print}' somefile.csv
>     >
>     >> an extra OFS character is inserted between
>     >> fields two and three. I can add that removing
>     >> the gsub() in either of the two examples does
>     >> not affect the results.
>     >
>     >> Might this be a bug in 4.0.2 or a feature I have
>     >> not yet understood?
>     >
>     >I don't have 4.0.2 available to test, but I tested with older and newer
>     >versions.
>     >
>     >When I test, I get the result I think I expect from the code you
>     >posted.
>     >
>     >Also, setting FPAT overrides the effect of having earlier set FS.  (I
>     >believe that the most-recently set one among FS, FPAT, and FIELDWIDTHS
>     >controls the field splitting operation.)
>     >
>     >echo "1,2" | awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"}
>     >{$1=$1; print}' | hexdump -c
>     >0000000   1  \t   2  \n
>     >0000004
>     >
>     >It would be easier to help if you would please provide:
>     >
>     >  the simplest input line that reproduces the problem;
>     >
>     >  the output you expect; and
>     >
>     >  the output you are getting.
>
>     I am not on my computer but typing this on my phone. With that caveat, a 
> /minimal/ example would be:
>     echo "Alpha,Beta,Charlie,Delta" | awk 'BEGIN{FS=","; 
> FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}'
>
>     I would expect to see:
>     Alpha<TAB>Beta<TAB>Charlie<TAB>Delta
>     but instead see
>     Alpha<TAB><TAB>Beta<TAB>Charlie<TAB>Delta
>
>     If you change $1=$1 to $2=$2 you will find that the extra tab character 
> then moves to the next field.
>
>     I believe I had also tried without the definition of FS with the same 
> result.
>
>     Finally, note that the FPAT expression comes from the awk documentation 
> and is thus expected to work.
>
>     Can anyone try this with the most recent version of awk?
>
>
> I think there is a bug here: (I fixed your FPAT, but that issue is unrelated 
> to what you're reporting)
> $ cat somefile.csv
> 1,"this field, has a comma",3,4
> $ cat p11
>  gawk 'BEGIN {
>         FPAT="[^,]*|[\"][^\"]+[\"]"
>         OFS="\t"
> }
>         {
>         for (i = 1; i <= NF; i++) x=$i # if you comment this line out, you'll 
> get the extra tab on output
>         $1=$1;
>         gsub(/"/, "");
>         print
> }' somefile.csv
> $ ./bash pp11 | xxd
> 0000000: 3109 7468 6973 2066 6965 6c64 2c20 6861  1.this field, ha
> 0000010: 7320 6120 636f 6d6d 6109 3309 340a       s a comma.3.4.
>
> however, it does seemed to be fixed in 5.2.1
>
>
Why the need to "fix" my FPAT? As I stated earlier, the FPAT I used is from the 
awk documentation.

Also, it is better to keep this discussion on the mailing list where it 
belongs, no need to pollute my personal email.

[Prev in Thread]

Current Thread

[Next in Thread]

Insertion of extra OFS character into output string, H, 2023/03/13
- Re: Insertion of extra OFS character into output string, david kerns, 2023/03/13
- Re: Insertion of extra OFS character into output string, Andrew J. Schorr, 2023/03/13
  - Re: Insertion of extra OFS character into output string, H, 2023/03/14
    - Re: Insertion of extra OFS character into output string, Andrew J. Schorr, 2023/03/14
- Re: Insertion of extra OFS character into output string, Neil R. Ormos, 2023/03/13
  - Re: Insertion of extra OFS character into output string, H, 2023/03/13
    - Re: Insertion of extra OFS character into output string, david kerns, 2023/03/13
    - Re: Insertion of extra OFS character into output string, H <=
    - Re: Insertion of extra OFS character into output string, Neil R. Ormos, 2023/03/13
    - Re: Insertion of extra OFS character into output string, H, 2023/03/14

Prev by Date: Re: Insertion of extra OFS character into output string
Next by Date: Re: Insertion of extra OFS character into output string
Previous by thread: Re: Insertion of extra OFS character into output string
Next by thread: Re: Insertion of extra OFS character into output string
Index(es):
- Date
- Thread