bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sprintf() of gawk in cvs works very slow under ja_JP.UTF8


From: Aharon Robbins
Subject: Re: sprintf() of gawk in cvs works very slow under ja_JP.UTF8
Date: Tue, 03 Feb 2009 05:31:31 +0200

Hi.

Is this an issue for a real application, or is it something that
you just happened to notice?

Gawk is smarter than the average awk, and it understands multibyte
characters for things like  %.5s in format strings. But to do this it
has to parse the raw string, and that is not without cost.  There's
no free lunch here. :-(

If this is really an issue for an application, I will revisit this;
otherwise I am reluctant to do so because it means the code will probably
have to get messier than it already is.

Thanks,

Arnold

> Date: Mon, 2 Feb 2009 00:54:38 +0900
> From: Hirofumi Saito <address@hidden>
> To: Aharon Robbins <address@hidden>
> Cc: address@hidden
> Subject: Re: sprintf() of gawk in cvs works very slow under ja_JP.UTF8
>
> On Sat, 31 Jan 2009 23:18:25 +0200
> Aharon Robbins <address@hidden> wrote:
>
> > Hi. Re the below.
> > 
> > Thanks for the report. A patch is attached. It will show up in the CVS
> > in the next few days.
>
> Thank you for your patch.
> Your patch works very fine.
> But if I change the argument of sprintf() function from "%s" to "%10s",
> gawk works slow again.
>
> BEGIN {
>     str = "abc";
>     for (i = 1; i <= 10000; i++) {
>         result = sprintf("%10s%s", result, str);
>     }
>     print result;
> }
>
> regards,
>
> Hirofumi Saito.
>
> > ---------------------------
> > > Date: Sun, 1 Feb 2009 03:12:25 +0900
> > > From: Hirofumi Saito <address@hidden>
> > > To: address@hidden
> > > Subject: sprintf() of gawk in cvs works very slow under ja_JP.UTF8
> > >
> > > Hello,
> > >
> > > The sprintf() function of latest gawk in CVS is very very slow.
> > > I've checked the script below.
> > >
> > > $ cat test.awk 
> > > BEGIN {
> > >     str = "abc";
> > >     for (i = 1; i <= 10000; i++) {
> > >         result = sprintf("%s%s", result, str);
> > >     }
> > >     print result;
> > > }
> > >
> > > $ time gawk -f test.awk > /dev/null 
> > > gawk -f test.awk > /dev/null  7.61s user 0.01s system 99% cpu 7.674 total
> > >
> > > But under C locale, it works very fast (over 100 times faster!!).
> > >
> > > $ time LC_ALL=C gawk -f test.awk > /dev/null
> > > LC_ALL=C gawk -f test.awk > /dev/null  0.05s user 0.00s system 18% cpu 
> > > 0.301 total
> > >
> > > The release version of gawk-3.1.6 works fast under ja_JP.UTF8.
> > >
> > > $ time ../gawk-3.1.6/gawk -f test.awk > /dev/null 
> > > ../gawk-3.1.6/gawk -f test.awk > /dev/null  0.05s user 0.00s system 95% 
> > > cpu 0.056 total
> > >
> > > regards,
> > >
> > > Hirofumi Saito.
> > -------------------------------------
> > Sat Jan 31 23:14:00 2009  Arnold D. Robbins  <address@hidden>
> > 
> >     * builtin.c (format_tree): For '%s', don't count the multibyte
> >     characters if we are just copying all the characters. Gives
> >     big speedup. Thanks to  Hirofumi Saito <address@hidden>
> >     for reporting the problem.
> > 
> > Index: builtin.c
> > ===================================================================
> > RCS file: /d/mongo/cvsrep/gawk-stable/builtin.c,v
> > retrieving revision 1.29
> > diff -u -r1.29 builtin.c
> > --- builtin.c       29 Dec 2008 20:44:02 -0000      1.29
> > +++ builtin.c       31 Jan 2009 21:13:35 -0000
> > @@ -984,11 +984,13 @@
> >                             fill = zero_string;
> >                     parse_next_arg();
> >                     arg = force_string(arg);
> > -                   char_count = mbc_char_count(arg->stptr, arg->stlen);
> >                     if (fw == 0 && ! have_prec)
> >                             prec = arg->stlen;
> > -                   else if (! have_prec || prec > char_count)
> > -                           prec = char_count;
> > +                   else {
> > +                           char_count = mbc_char_count(arg->stptr, 
> > arg->stlen);
> > +                           if (! have_prec || prec > char_count)
> > +                                   prec = char_count;
> > +                   }
> >                     cp = arg->stptr;
> >                     goto pr_tail;
> >             case 'd':
> > 
> > 
>
>
> -- 
> ----------
> $BF|K\(B GNU AWK $B%f!<%6!<2q(B
> Hirofumi Saito (address@hidden(B)
> address@hidden
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]