bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gawk -i inplace is an order of magnitude faster when also redirecting st


From: Ed Morton
Subject: gawk -i inplace is an order of magnitude faster when also redirecting stdout
Date: Thu, 29 Feb 2024 06:51:31 -0600
User-agent: Mozilla Thunderbird

Someone on StackExchange was asking about their gawk script being slow and someone else (https://unix.stackexchange.com/a/771263/133219) pointed out that using `-i inplace` is an order of magnitude slower if you don't also redirect stdout which seems unintuitive at best.

For example given a 1 million line input file created by:

   $ seq 1000000 > file1m

and using:

   $ awk --version
   GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)

If we just reproduce it as-is using `-i inplace` the timing is:

   $ time { awk -i inplace '1' file1m; }

   real    0m2.544s
   user    0m0.265s
   sys     0m1.843s

whereas if we redirect stdout even though there is no stdout produced:

   $ time { awk -i inplace '1' file1m >/dev/null; }

   real    0m0.236s
   user    0m0.187s
   sys     0m0.000s

As you can see that second execution with stdout redirected ran an order of magnitude faster. The person who investigated thinks it's due to the first execution being considered "interactive" since stdout isn't technically being redirected and so doing line buffering vs the second execution being "non-interactive" due to stdout being redirected and so using a larger buffer.

If that is the case, could gawk be updated to consider "inplace" editing as non-interactive? If not, I think it'd be worth a statement in the manual about this difference in performance between the 2.

    Ed.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]