[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to Generate a Long String of the Same Character
From: |
Bob Proulx |
Subject: |
Re: How to Generate a Long String of the Same Character |
Date: |
Sun, 18 Jul 2021 22:59:53 -0600 |
Neil R. Ormos wrote:
> In a message on the bug-gawk list, Ed Mortin wrote:
> That should have been "Ed Morton".
> > On an online forum someone asked how to generate a
> > string of 100,000,000 "x"s. They had tried this in
> > a BEGIN section:
> >
> > for(i=1;i<=100000000;i++) s = s "x"
>...
> Building a big string by iterating in tiny chunks
> would seem to invite poor performance.
Agreed. Growing by one character at a time definitely seems
inefficient.
> Instead, why not append the string to itself,
> doubling its size with each iteration? For
> example:
>
> time ~/.local/bin/gawk-5.1.0 \
> 'BEGIN{sizelim=100000000; a="x"; while (length(a) < sizelim) {a=a a};
> a=substr(a, 1, sizelim); print length(a);}'
I think that is probably one of the best ways with awk.
My mind first thought that it would be better to produce a file that
contained 100 million "x"s and then read it into awk.
awk '{print length($0)}' < bigfileofx
Of course that simply changes the problem around to creating that
file! This is rather a silly response but it's fun just the same.
Well... There are certainly many ways to do it. I would use dd for
creating the byte stream of the right size. But there seems no way to
use dd to produce "x" characters. But it can read /dev/zero okay.
And tr can translate zeros to other characters such as an "x".
$ dd status=none if=/dev/zero bs=1 count=10 | tr "\0" "x"; echo
xxxxxxxxxx
$ dd status=none if=/dev/zero bs=1 count=10 | tr "\0" "x" | wc -c
10
That looks promising. Let's fire it up for the requested 100 million
size.
$ time dd status=none if=/dev/zero bs=1M count=100 | tr "\0" "x" | wc -c
104857600
real 0m0.179s
user 0m0.126s
sys 0m0.167s
Looks like the right size. Let's get it into awk.
$ time dd status=none if=/dev/zero bs=1M count=100 | tr "\0" "x" | awk
'{print length($0)}'
104857600
real 0m0.624s
user 0m0.451s
sys 0m0.398s
That's looking pretty good. Let's compare it against the reference
above so one can see how slow my machine is about such things.
$ time awk 'BEGIN{sizelim=100000000; a="x"; while (length(a) < sizelim)
{a=a a}; a=substr(a, 1, sizelim); print length(a);}'
100000000
real 0m1.469s
user 0m0.815s
sys 0m0.654s
I am running this on an older Intel Core i5 CPU 750 2.67GHz.
> On my not-very-fast machine, according to the time
> built-in, that takes 0.17 seconds of elapsed time.
Faster than my daily driving desktop! :-)
> Yes, worst-case, if the intended string has length
> (2^N)+1, you wastefully build a string of size
> 2^(N+1) and trim off almost half. So maybe on
> some machines, building the string in
> single-character units would work but the doubling
> would not.
Fun stuff! And illustrates the usefulness of benchmarking to collect
data.
Bob