[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#20029: 'yes' surprisingly slow
From: |
Giuseppe Scrivano |
Subject: |
bug#20029: 'yes' surprisingly slow |
Date: |
Tue, 10 Mar 2015 01:31:36 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) |
Pádraig Brady <address@hidden> writes:
> The attached should make things more efficient here.
>
> thanks,
> Pádraig.
>
>
> From 7959bbf19307705e98f08cfa32a9dcf67672590c Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?P=C3=A1draig=20Brady?= <address@hidden>
> Date: Mon, 9 Mar 2015 19:27:32 +0000
> Subject: [PATCH] yes: output data more efficiently
>
> yes(1) may be used to generate repeating patterns of text
> for test inputs etc., so adjust to be more efficient.
>
> Profiling the case where yes(1) is outputting small items
> through stdio (which was the default case), shows the overhead
> continuously processing small items in main() and in stdio:
>
> $ yes >/dev/null & perf top -p $!
> 31.02% yes [.] main
> 27.36% libc-2.20.so [.] _IO_file_xsputn@@GLIBC_2.2.5
> 14.51% libc-2.20.so [.] fputs_unlocked
> 13.50% libc-2.20.so [.] strlen
> 10.66% libc-2.20.so [.] __GI___mempcpy
> 1.98% yes [.] address@hidden
>
> Sending more data per stdio call improves the situation,
> but still, there is significant stdio overhead due to memory copies,
> and the repeated string length checking:
>
> $ yes "`echo {1..1000}`" >/dev/null & perf top -p $!
> 42.26% libc-2.20.so [.] __GI___mempcpy
> 17.38% libc-2.20.so [.] strlen
> 5.21% [kernel] [k] __srcu_read_lock
> 4.58% [kernel] [k] __srcu_read_unlock
> 4.27% libc-2.20.so [.] _IO_file_xsputn@@GLIBC_2.2.5
> 2.50% libc-2.20.so [.] __GI___libc_write
> 2.45% [kernel] [k] system_call
> 2.40% [kernel] [k] system_call_after_swapgs
> 2.27% [kernel] [k] vfs_write
> 2.09% libc-2.20.so [.] _IO_do_write@@GLIBC_2.2.5
> 2.01% [kernel] [k] fsnotify
> 1.95% libc-2.20.so [.] _IO_file_write@@GLIBC_2.2.5
> 1.44% yes [.] main
>
> We can avoid all stdio overhead by building up the buffer
> _once_ and outputting that, and the profile below shows
> the bottleneck moved to the kernel:
>
> $ src/yes >/dev/null & perf top -p $!
> 15.42% [kernel] [k] __srcu_read_lock
> 12.98% [kernel] [k] __srcu_read_unlock
> 9.41% libc-2.20.so [.] __GI___libc_write
> 9.11% [kernel] [k] vfs_write
> 8.35% [kernel] [k] fsnotify
> 8.02% [kernel] [k] system_call
> 5.84% [kernel] [k] system_call_after_swapgs
> 4.54% [kernel] [k] __fget_light
> 3.98% [kernel] [k] sys_write
> 3.65% [kernel] [k] selinux_file_permission
> 3.44% [kernel] [k] rw_verify_area
> 2.94% [kernel] [k] __fsnotify_parent
> 2.76% [kernel] [k] security_file_permission
> 2.39% yes [.] main
> 2.17% [kernel] [k] __fdget_pos
> 2.13% [kernel] [k] sysret_check
> 0.81% [kernel] [k] write_null
> 0.36% yes [.] address@hidden
>
> Note this change also ensures that yes will only write complete lines
> for lines softer than BUFSIZ.
>
> * src/yes.c (main): Build up a BUFSIZ buffer of lines,
> and output that, rather than having stdio process each item.
> * tests/misc/yes.sh: Add a new test for various buffer sizes.
> * tests/local.mk: Reference the new test.
> Fixes http://bugs.gnu.org/20029
> ---
> src/yes.c | 43 +++++++++++++++++++++++++++++++++++++++++--
> tests/local.mk | 1 +
> tests/misc/yes.sh | 28 ++++++++++++++++++++++++++++
> 3 files changed, 70 insertions(+), 2 deletions(-)
> create mode 100755 tests/misc/yes.sh
>
> diff --git a/src/yes.c b/src/yes.c
> index b35b13f..91dea11 100644
> --- a/src/yes.c
> +++ b/src/yes.c
> @@ -58,6 +58,10 @@ Repeatedly output a line with all specified STRING(s), or
> 'y'.\n\
> int
> main (int argc, char **argv)
> {
> + char buf[BUFSIZ];
> + char *pbuf = buf;
> + int i;
> +
> initialize_main (&argc, &argv);
> set_program_name (argv[0]);
> setlocale (LC_ALL, "");
> @@ -77,9 +81,44 @@ main (int argc, char **argv)
> argv[argc++] = bad_cast ("y");
> }
>
> - while (true)
> + /* Buffer data locally once, rather than having the
> + large overhead of stdio buffering each item. */
> + for (i = optind; i < argc; i++)
> + {
> + size_t len = strlen (argv[i]);
> + if (BUFSIZ < len || BUFSIZ - len <= pbuf - buf)
> + break;
> + memcpy (pbuf, argv[i], len);
> + pbuf += len;
> + *pbuf++ = i == argc - 1 ? '\n' : ' ';
> + }
> + if (i < argc)
> + pbuf = NULL;
since the buffer is partly filled, wouldn't be better to reuse it?
Something like this (barely tested):
diff --git a/src/yes.c b/src/yes.c
index 91dea11..ac690ce 100644
--- a/src/yes.c
+++ b/src/yes.c
@@ -92,9 +92,7 @@ main (int argc, char **argv)
pbuf += len;
*pbuf++ = i == argc - 1 ? '\n' : ' ';
}
- if (i < argc)
- pbuf = NULL;
- else
+ if (i == argc)
{
size_t line_len = pbuf - buf;
size_t lines = BUFSIZ / line_len;
@@ -106,7 +104,7 @@ main (int argc, char **argv)
}
/* The normal case is to continuously output the local buffer. */
- while (pbuf)
+ while (i == argc)
{
if (write (STDOUT_FILENO, buf, pbuf - buf) == -1)
{
@@ -117,11 +115,17 @@ main (int argc, char **argv)
/* If the data doesn't fit in BUFSIZ then it's large
and not too onerous to output using stdio in any case. */
- while (! pbuf)
+ while (i != argc)
{
- for (i = optind; i < argc; i++)
- if (fputs (argv[i], stdout) == EOF
- || putchar (i == argc - 1 ? '\n' : ' ') == EOF)
+ int j;
+ if ((pbuf - buf) && fwrite (buf, pbuf - buf, 1, stdout) == 0)
+ {
+ error (0, errno, _("standard output"));
+ return EXIT_FAILURE;
+ }
+ for (j = i; j < argc; j++)
+ if (fputs (argv[j], stdout) == EOF
+ || putchar (j == argc - 1 ? '\n' : ' ') == EOF)
{
error (0, errno, _("standard output"));
return EXIT_FAILURE;
The difference seems to be (without the patch):
$ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
55c293324aa6ecce14f0bf30da5a4686 -
real 0m7.994s
user 0m11.093s
sys 0m2.953s
versus (with the patch):
$ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
55c293324aa6ecce14f0bf30da5a4686 -
real 0m3.534s
user 0m4.164s
sys 0m1.803s
Regards,
Giuseppe