[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Field separators in awk
From: |
Andrew J. Schorr |
Subject: |
Re: [bug-gawk] Field separators in awk |
Date: |
Tue, 31 Dec 2013 09:47:42 -0500 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi,
On Tue, Dec 31, 2013 at 07:40:28AM +0100, address@hidden wrote:
> However, it seems inefficient to call split() on $0 just to obtain
> the matched text. (Since the field splitting has already been done
> by awk).
I think that the $ variables are evaluated on a lazy basis. In field.c,
there is this comment regarding the set_record function:
/*
* set_record:
* setup $0, but defer parsing rest of line until reference is made to $(>0)
* or to NF. At that point, parse only as much as necessary.
*
* Manage a private buffer for the contents of $0. Doing so keeps us safe
* if `getline var' decides to rearrange the contents of the IOBUF that
* $0 might have been pointing into. The cost is the copying of the buffer;
* but better correct than fast.
*/
You can also look in field.c for the "parse_high_water" variable:
static long parse_high_water = 0; /* field number that we have parsed so far */
So I don't think there should be any performance hit unless the script tries
to access NF or fields greater than 0.
> I would propose either to add a new builtin variable, for instance
> it could be called "FT", that contains the separators, or if this is
> inefficient (for the common case, where that variable is not used by
> the user) to have an option (like I have proposed in a previous
> mail) to set FIELDWIDTHS="0" which should make awk skip the field
> splitting process all together, and just assign the whole line to
> $1, and set NF=1. Then it is up to the user to call split() on $0 if
> he likes to.
Also, what is wrong with saying something like:
gawk '-F\n' '...'
That sets the field separator to the new line character (i.e. the same as RS).
bash-4.2$ awk '-F\n' 'BEGIN {print FS; print RS}' | od -c
0000000 \n \n \n \n
0000004
bash-4.2$ echo 'this is a test' | awk '-F\n' '{print NF; print $1}'
1
this is a test
Regards,
Andy