[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] speed regression when doing math with gawk 5
From: |
arnold |
Subject: |
Re: [bug-gawk] speed regression when doing math with gawk 5 |
Date: |
Sun, 21 Apr 2019 09:27:20 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Hi.
Thanks for the report, the analysis, and the patch.
I have applied Andy Schorr's patch to the code base and pushed to Git.
Thanks!
Arnold
Tom Gray <address@hidden> wrote:
> Hi all,
>
> I noticed a big speed regression in some of my programs when I upgraded from
> 4.2.1 to 5.0
>
> I traced the issue to commit c1f670b26, a fix for "Numeric assignment to $0"
> See discussion here:
> http://lists.gnu.org/archive/html/bug-gawk/2018-07/msg00042.html
>
> The speed hit shows up when you do a lot of numeric computation.
> The "fix" in commit c1f670b26 adds a call to force_string() (line 46,
> interpret.h) which
> punishes every assignment with a number to string conversion.
>
> In the examples here,
> gawk5 is built from the current master branch
> gawk5n has force_string() removed
>
> $ time ~/src/gawk/gawk5 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
>
> real 0m3.980s
> user 0m3.962s
> sys 0m0.015s
>
>
> $ time ~/src/gawk/gawk5n 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
>
> real 0m0.130s
> user 0m0.124s
> sys 0m0.000s
>
>
> The original problem was the numeric assignment to $0 followed by the output
> of $0.
> Numeric assignment does not automagically trigger generation of the string
> representation
> required for output. A better way to do that is lazily like all other
> conversions.
> The string is not needed until $0 gets sent to the output pipe.
>
> Adding force_string() in do_print_rec() gets the job done.
>
> New version with both changes applied, fixes the original problem and keeps
> the numeric assignments fast:
>
> $ cat abc.txt
> a b c
> a b c
> a b c
> a b c
> a b c
>
> $ ~/src/gawk/gawk '{$0 = ++i;} 1' < abc.txt
> 1
> 2
> 3
> 4
> 5
>
> $ time ~/src/gawk/gawk 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
>
> real 0m0.130s
> user 0m0.109s
> sys 0m0.015s
>
>
> $ git diff interpret.h
> diff --git a/interpret.h b/interpret.h
> index cc6e02af..4044b808 100644
> --- a/interpret.h
> +++ b/interpret.h
> @@ -46,7 +46,6 @@ unfield(NODE **l, NODE **r)
> (*l) = dupnode(*r);
> DEREF(*r);
> }
> - force_string(*l);
> }
>
> #define UNFIELD(l, r) unfield(& (l), & (r))
>
>
>
> diff --git a/builtin.c b/builtin.c
> index 491a96b3..4f461650 100644
> --- a/builtin.c
> +++ b/builtin.c
> @@ -2340,6 +2340,7 @@ do_print_rec(int nargs, int redirtype)
> (void) get_field(0L, NULL);
>
> f0 = fields_arr[0];
> + force_string(f0);
>
> if (do_lint && (f0->flags & NULL_FIELD) != 0)
> lintwarn(_("reference to uninitialized field `$%d'"), 0);
>