bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] speed regression when doing math with gawk 5


From: arnold
Subject: Re: [bug-gawk] speed regression when doing math with gawk 5
Date: Sun, 21 Apr 2019 09:27:20 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hi.

Thanks for the report, the analysis, and the patch.

I have applied Andy Schorr's patch to the code base and pushed to Git.

Thanks!

Arnold

Tom Gray <address@hidden> wrote:

> Hi all,
>
> I noticed a big speed regression in some of my programs when I upgraded from 
> 4.2.1 to 5.0
>
> I traced the issue to commit c1f670b26, a fix for "Numeric assignment to $0"
> See discussion here: 
> http://lists.gnu.org/archive/html/bug-gawk/2018-07/msg00042.html
>
> The speed hit shows up when you do a lot of numeric computation.
> The "fix" in commit c1f670b26 adds a call to force_string() (line 46, 
> interpret.h) which
> punishes every assignment with a number to string conversion.
>
> In the examples here,
> gawk5 is built from the current master branch
> gawk5n has force_string() removed
>
> $ time ~/src/gawk/gawk5 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
>
> real    0m3.980s
> user    0m3.962s
> sys     0m0.015s
>
>
> $ time ~/src/gawk/gawk5n 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
>
> real    0m0.130s
> user    0m0.124s
> sys     0m0.000s
>
>
> The original problem was the numeric assignment to $0 followed by the output 
> of $0.
> Numeric assignment does not automagically trigger generation of the string 
> representation
> required for output. A better way to do that is lazily like all other 
> conversions.
> The string is not needed until $0 gets sent to the output pipe.
>
> Adding force_string() in do_print_rec() gets the job done.
>
> New version with both changes applied, fixes the original problem and keeps 
> the numeric assignments fast:
>
> $ cat abc.txt
> a b c
> a b c
> a b c
> a b c
> a b c
>
> $ ~/src/gawk/gawk '{$0 = ++i;} 1' < abc.txt
> 1
> 2
> 3
> 4
> 5
>
> $ time ~/src/gawk/gawk 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
>
> real    0m0.130s
> user    0m0.109s
> sys     0m0.015s
>
>
> $ git diff interpret.h
> diff --git a/interpret.h b/interpret.h
> index cc6e02af..4044b808 100644
> --- a/interpret.h
> +++ b/interpret.h
> @@ -46,7 +46,6 @@ unfield(NODE **l, NODE **r)
>                 (*l) = dupnode(*r);
>                 DEREF(*r);
>         }
> -       force_string(*l);
> }
>
> #define UNFIELD(l, r)  unfield(& (l), & (r))
>
>
>
> diff --git a/builtin.c b/builtin.c
> index 491a96b3..4f461650 100644
> --- a/builtin.c
> +++ b/builtin.c
> @@ -2340,6 +2340,7 @@ do_print_rec(int nargs, int redirtype)
>                 (void) get_field(0L, NULL);
>
>         f0 = fields_arr[0];
> +    force_string(f0);
>
>         if (do_lint && (f0->flags & NULL_FIELD) != 0)
>                 lintwarn(_("reference to uninitialized field `$%d'"), 0);
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]