[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gawk] speed regression when doing math with gawk 5
From: |
Tom Gray |
Subject: |
[bug-gawk] speed regression when doing math with gawk 5 |
Date: |
Sat, 20 Apr 2019 00:55:53 +0000 |
Hi all,
I noticed a big speed regression in some of my programs when I upgraded from
4.2.1 to 5.0
I traced the issue to commit c1f670b26, a fix for "Numeric assignment to $0"
See discussion here:
http://lists.gnu.org/archive/html/bug-gawk/2018-07/msg00042.html
The speed hit shows up when you do a lot of numeric computation.
The "fix" in commit c1f670b26 adds a call to force_string() (line 46,
interpret.h) which
punishes every assignment with a number to string conversion.
In the examples here,
gawk5 is built from the current master branch
gawk5n has force_string() removed
$ time ~/src/gawk/gawk5 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
real 0m3.980s
user 0m3.962s
sys 0m0.015s
$ time ~/src/gawk/gawk5n 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
real 0m0.130s
user 0m0.124s
sys 0m0.000s
The original problem was the numeric assignment to $0 followed by the output of
$0.
Numeric assignment does not automagically trigger generation of the string
representation
required for output. A better way to do that is lazily like all other
conversions.
The string is not needed until $0 gets sent to the output pipe.
Adding force_string() in do_print_rec() gets the job done.
New version with both changes applied, fixes the original problem and keeps the
numeric assignments fast:
$ cat abc.txt
a b c
a b c
a b c
a b c
a b c
$ ~/src/gawk/gawk '{$0 = ++i;} 1' < abc.txt
1
2
3
4
5
$ time ~/src/gawk/gawk 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
real 0m0.130s
user 0m0.109s
sys 0m0.015s
$ git diff interpret.h
diff --git a/interpret.h b/interpret.h
index cc6e02af..4044b808 100644
--- a/interpret.h
+++ b/interpret.h
@@ -46,7 +46,6 @@ unfield(NODE **l, NODE **r)
(*l) = dupnode(*r);
DEREF(*r);
}
- force_string(*l);
}
#define UNFIELD(l, r) unfield(& (l), & (r))
diff --git a/builtin.c b/builtin.c
index 491a96b3..4f461650 100644
--- a/builtin.c
+++ b/builtin.c
@@ -2340,6 +2340,7 @@ do_print_rec(int nargs, int redirtype)
(void) get_field(0L, NULL);
f0 = fields_arr[0];
+ force_string(f0);
if (do_lint && (f0->flags & NULL_FIELD) != 0)
lintwarn(_("reference to uninitialized field `$%d'"), 0);
- [bug-gawk] speed regression when doing math with gawk 5,
Tom Gray <=