Re: New built-in variable

bug-gnu-utils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New built-in variable

From:	Aharon Robbins
Subject:	Re: New built-in variable
Date:	Thu, 18 Dec 2008 04:52:34 +0200

Hello Jean-Michel,

> Date: Tue, 16 Dec 2008 10:01:13 +0100
> From: Jean-Michel ELYN <address@hidden>
> To: <address@hidden>
> Subject: Re: New built-in variable
>
> Hello Aharon,
>
> First, I want to thank you for your detailed answer. Now I understand a 
> bit how awk command works. I now know my request has been listened to, 
> what the result will be. However, I would like to provide some 
> additional details about what I would like, I'm not sure you have 
> understood what I'm looking for because English is not my mother tongue.

I think I understood pretty well the first time. :-)

> So, I wrote I would like you to create a new built-in variable, FIELDS. 
> This is not absolutely correct and it is obviously a not negligible 
> cost. To be more precise, I would like a reference to the record, not a 
> new variable. It would be great accessing $1, $2... as FIELDS[1], 
> FIELDS[2]... so that calling awk functions using array(s) would be 
> possible with direct access to current record, such as asort() or 
> match(). This way no additional memory to allocate, just a new way to 
> refer fields... as long as $1, $2... are actually created as an array. 
> To tell it another way, $1 and FIELDS[1] would access the same memory 
> block. The only difference is it would be possible to refer to the whole 
> array as FIELDS. No way to do that with $i.

You are missing the point, I'm afraid. The costs here are twofold.
First, gawk would be required to split the record, every time, even if
the FIELDS array is not used.  Right now, gawk only splits the record
when it has to, as much as it has to.

Second, given the way arrays in awk are defined, there is a non-neglible
cost in clearing and rebuilding the array for each record.

For large files, these overheads add up and would slow gawk down
significantly.  And these days people pump lots and lots of data
through gawk.

I do not want every user to have to pay these costs.

There is also added complexity in making sure the semantics are right.
For example, what if someone does

        delete FIELDS[2]

Is that the same as $2 = "" ?  Does $0 get rebuilt? What about

        delete FIELDS

I have learned, often to my dismay, that every "little" new feature
usually ends up with hidden costs or semantic "gotchas", and I don't
really see the real value in this feature.

If you need an array, you can get one with split; this is portable and
it's one additional line of code to create the array.  If you need to
put the fields back together into a single line, that's a function of
less than 10 lines.

So, while I sincerely appreciate your input and interest, I have to
decline on putting this in.  Since gawk is Free Software, you are,
of course, free to add the feature yourself, or find a consultant
willing to do so.

> PS: Don't know how to post this mail in the thread I've started. I'm 
> afraid of creating a new thread as simply sending it to 
> address@hidden Sorry if I've mistaken.

Mailing to the mailing list is fine.

Thanks!

Arnold

[Prev in Thread]

Current Thread

[Next in Thread]

New built-in variable, Jean-Michel Elyn, 2008/12/14
- New built-in variable, Jean-Michel ELYN, 2008/12/15
- Re: New built-in variable, Aharon Robbins, 2008/12/15
  - Re: New built-in variable, Jean-Michel ELYN, 2008/12/16
- Re: New built-in variable, Aharon Robbins <=

Prev by Date: gnu time - ru_maxrss value
Next by Date: Re: gawk: Wrong behavior in binary mode
Previous by thread: Re: New built-in variable
Next by thread: gnu time - ru_maxrss value
Index(es):
- Date
- Thread