bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk -i inplace is an order of magnitude faster when also redirectin


From: david kerns
Subject: Re: gawk -i inplace is an order of magnitude faster when also redirecting stdout
Date: Thu, 29 Feb 2024 06:35:01 -0700

swap the order (do the redirect one first) I suspect the input file was
still cached for the 2nd run


On Thu, Feb 29, 2024 at 5:52 AM Ed Morton <mortoneccc@comcast.net> wrote:

> Someone on StackExchange was asking about their gawk script being slow
> and someone else (https://unix.stackexchange.com/a/771263/133219)
> pointed out that using `-i inplace` is an order of magnitude slower if
> you don't also redirect stdout which seems unintuitive at best.
>
> For example given a 1 million line input file created by:
>
>     $ seq 1000000 > file1m
>
> and using:
>
>     $ awk --version
>     GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)
>
> If we just reproduce it as-is using `-i inplace` the timing is:
>
>     $ time { awk -i inplace '1' file1m; }
>
>     real    0m2.544s
>     user    0m0.265s
>     sys     0m1.843s
>
> whereas if we redirect stdout even though there is no stdout produced:
>
>     $ time { awk -i inplace '1' file1m >/dev/null; }
>
>     real    0m0.236s
>     user    0m0.187s
>     sys     0m0.000s
>
> As you can see that second execution with stdout redirected ran an order
> of magnitude faster. The person who investigated thinks it's due to the
> first execution being considered "interactive" since stdout isn't
> technically being redirected and so doing line buffering vs the second
> execution being "non-interactive" due to stdout being redirected and so
> using a larger buffer.
>
> If that is the case, could gawk be updated to consider "inplace" editing
> as non-interactive? If not, I think it'd be worth a statement in the
> manual about this difference in performance between the 2.
>
>      Ed.
>
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]