bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] speed regression when doing math with gawk 5


From: Tom Gray
Subject: [bug-gawk] speed regression when doing math with gawk 5
Date: Sat, 20 Apr 2019 00:55:53 +0000

Hi all,

I noticed a big speed regression in some of my programs when I upgraded from 
4.2.1 to 5.0

I traced the issue to commit c1f670b26, a fix for "Numeric assignment to $0"
See discussion here: 
http://lists.gnu.org/archive/html/bug-gawk/2018-07/msg00042.html

The speed hit shows up when you do a lot of numeric computation.
The "fix" in commit c1f670b26 adds a call to force_string() (line 46, 
interpret.h) which
punishes every assignment with a number to string conversion.

In the examples here,
gawk5 is built from the current master branch
gawk5n has force_string() removed

$ time ~/src/gawk/gawk5 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'

real    0m3.980s
user    0m3.962s
sys     0m0.015s


$ time ~/src/gawk/gawk5n 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'

real    0m0.130s
user    0m0.124s
sys     0m0.000s


The original problem was the numeric assignment to $0 followed by the output of 
$0.
Numeric assignment does not automagically trigger generation of the string 
representation
required for output. A better way to do that is lazily like all other 
conversions.
The string is not needed until $0 gets sent to the output pipe.

Adding force_string() in do_print_rec() gets the job done.

New version with both changes applied, fixes the original problem and keeps the 
numeric assignments fast:

$ cat abc.txt
a b c
a b c
a b c
a b c
a b c

$ ~/src/gawk/gawk '{$0 = ++i;} 1' < abc.txt
1
2
3
4
5

$ time ~/src/gawk/gawk 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'

real    0m0.130s
user    0m0.109s
sys     0m0.015s


$ git diff interpret.h
diff --git a/interpret.h b/interpret.h
index cc6e02af..4044b808 100644
--- a/interpret.h
+++ b/interpret.h
@@ -46,7 +46,6 @@ unfield(NODE **l, NODE **r)
                (*l) = dupnode(*r);
                DEREF(*r);
        }
-       force_string(*l);
}

#define UNFIELD(l, r)  unfield(& (l), & (r))



diff --git a/builtin.c b/builtin.c
index 491a96b3..4f461650 100644
--- a/builtin.c
+++ b/builtin.c
@@ -2340,6 +2340,7 @@ do_print_rec(int nargs, int redirtype)
                (void) get_field(0L, NULL);

        f0 = fields_arr[0];
+    force_string(f0);

        if (do_lint && (f0->flags & NULL_FIELD) != 0)
                lintwarn(_("reference to uninitialized field `$%d'"), 0);



reply via email to

[Prev in Thread] Current Thread [Next in Thread]