bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New built-in variable


From: Aharon Robbins
Subject: Re: New built-in variable
Date: Mon, 15 Dec 2008 23:09:55 +0200

Hi. Re this:

> Date: Mon, 15 Dec 2008 09:51:30 +0100
> From: Jean-Michel ELYN <address@hidden>
> To: <address@hidden>
> Subject: New built-in variable
>
> Hello,
>
> It seems previous mail has arrived empty. Going to try again...

I got the first one, although it looked a bit corrupted. I just
didn't get to reply yet.

> I often use AWK (actually GAWK on Linux) because it is really powerful. 

Great!

> ....
> Create a built-in array variable named FIELDS (for example) that 
> actually refers to $1, $2, .... The solution would become:
> gawk '{asort(FIELDS); print}' file_in > file_out
> This new built-in variable would much improve all work on small changes 
> such as field order.

Thank you for your suggestion.  Suggestions similar to this have been
made before (c.f. the recent discussion about accessing field separators
in comp.lang.awk).

I don't wish to do this for a reason: creating such an array for every
record would be very expensive.  Gawk currently does lazy parsing of
the fields; if you access $3 it only parses the record up to $3 and not
the whole way.  Setting the FIELDS array would require parsing the full
record every time, as well as require doing a large number of dynamic
memory frees and allocations, *for every record*.

Incurring these expenses for a feature that is unlikely to be used a
lot is not something I wish to do; users who need a feature should pay
for it, and not the whole world.  Thus, using split to create a way is
the right way to go.  It is also the only portable way to do things.

Since gawk uses the same code for split as for building the fields,
and it uses lazy parsing, it's no more expensive anyway, since in
effect the record is only parsed once.

(Lest you argue that asort() isn't portable, an awk function could be
written to do the same thing, so it's not as unportable as might
appear.)

FWIW, I speak from experience; when I first added RS as a regexp and the
RT variable to gawk, I did the simple thing and freed and then set RT
for every record.  This was very expensive, particularly since, in the
normal case, the value is always "\n".  I fixed gawk to only change RT
if there was a change, I got a big speedup.

Thanks,

Arnold




reply via email to

[Prev in Thread] Current Thread [Next in Thread]