Re: CSV extension status

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CSV extension status

From:	Andrew J. Schorr
Subject:	Re: CSV extension status
Date:	Wed, 19 May 2021 16:36:27 -0400
User-agent:	Mutt/1.5.21 (2010-09-15)

On Wed, May 19, 2021 at 08:01:30PM +0200, Manuel Collado wrote:
> El 19/05/2021 a las 15:44, Andrew J. Schorr escribió:
> >On Wed, May 19, 2021 at 09:03:36AM -0400, Andrew J. Schorr wrote:
> >>         _csv_nf = csvsplit($0, _csv_ff)
> >>         _csv_record = ""
> >>         _csv_sep = ""
> >>         for (k=1; k in _csv_ff; k++) {
> >>             _csv_record = _csv_record _csv_sep csvunquote(_csv_ff[k])
> >>             _csv_sep = OFS
> >>         }
> >>         $0 = _csv_record
> >
> >Why is this written as:
> >
> >        for (k=1; k in _csv_ff; k++) {
> >
> >instead of:
> >
> >        for (k=1; k <= _csv_nf; k++) {
> >
> >Unless I'm confused, the latter ought to be faster (simple arithmetic
> >comparison instead of a hash lookup).
> 
> Right. It is a matter of style. The former can be used when the
> number of elements is not explicitly stated. Well, it would suffice
> to call length(array). Not available in older awks.

Yes, but in this case, you know the number of elements, since you just set
_csv_nf to the value returned by csvsplit. When you already have the number of
elements, the loop will run faster using the simple numerical comparison. The
"in" operator does a hash lookup, which is certainly slower than comparing 2
numbers. And you're right -- using length to grab the number of elements up
front is still almost certainly faster than using the "in" hash test.

Regards,
Andy

[Prev in Thread]

Current Thread

[Next in Thread]

Re: CSV extension status, (continued)

Prev by Date: Re: CSV extension status
Next by Date: Re: CSV extension status
Previous by thread: Re: CSV extension status
Next by thread: Re: CSV extension status
Index(es):
- Date
- Thread