bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk-stable: truncated printf output


From: Aharon Robbins
Subject: Re: gawk-stable: truncated printf output
Date: Mon, 29 Dec 2008 22:13:30 +0200

Greetings.  Re this:

> Date: Tue, 16 Sep 2008 16:35:37 +0200

Sorry it's taken me so long to get to this.  The usual excuses
(job, family) apply. :-)

> From: "Hermann Peifer" <address@hidden>
> Subject: gawk-stable: truncated printf output
> To: address@hidden
>
> Hi Arnold,
>
> I might be doing something terribly wrong, but it looks to me that gawk
> printf (gawk-stable from CVS) doesn't like the mikro sign in my ISO-8859-1
> encoded text files. Printf output is silently truncated, see below.
>
> LANG=C seems to fix the issue, but this shouldn't be necessary, should it? 
>
> Regards, Hermann
>
>
> address@hidden:konti]> uname -a
> Linux whitefish.eea.eu.int 2.6.23.1-42.fc8 #1 SMP Tue Oct 30 13:55:12 EDT 
> 2007 i686 i686 i386 GNU/Linux
>
> address@hidden:konti]> locale
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_PAPER="en_US.UTF-8"
> LC_NAME="en_US.UTF-8"
> LC_ADDRESS="en_US.UTF-8"
> LC_TELEPHONE="en_US.UTF-8"
> LC_MEASUREMENT="en_US.UTF-8"
> LC_IDENTIFICATION="en_US.UTF-8"
> LC_ALL=
>
> address@hidden:konti]> file DEMSLT07.PMT
> DEMSLT07.PMT: ISO-8859 text, with CRLF line terminators
>
> address@hidden:konti]> gawk-versions/gawk-stable/gawk 'FNR==1{print $0}' 
> DEMSLT07.PMT
> COMPONENT suspended particulates <10 µm (air), day
>
> address@hidden:konti]> gawk-versions/gawk-stable/gawk 'FNR==1{printf "%s\n", 
> $0}' DEMSLT07.PMT
> COMPONENT suspended particulates <10
>
> address@hidden:konti]> LANG=C gawk-versions/gawk-stable/gawk 'FNR==1{printf 
> "%s\n", $0}' DEMSLT07.PMT
> COMPONENT suspended particulates <10 µm (air), day
>
> address@hidden:konti]> gawk-versions/gawk-3.1.6/gawk 'FNR==1{printf "%s\n", 
> $0}' DEMSLT07.PMT
> COMPONENT suspended particulates <10 µm (air), day
>
> address@hidden:konti]> gawk-versions/gawk-3.1.5/gawk 'FNR==1{printf "%s\n", 
> $0}' DEMSLT07.PMT
> COMPONENT suspended particulates <10 µm (air), day

The problem stems essentially from the fact that you have ISO-8859 data in
a UTF-8 locale.  As such, gawk gets confused as to the actual length of your
string.  You can see this: if you print length($0), gawk thinks you have 37
characters.

If you can move to an ISO locale, things would likely work correctly.

The following patch makes gawk a bit smarter: for %s without a fieldwidth
or precision, it always prints out the whole string.  This fixes your
particular problem; I'm not sure that it's a general fix, or if there
really is one that is completely general.

This will show up in the stable CVS in the next day or three.

Thanks for the report,

Arnold
-----------------------------------------------------------
Mon Dec 29 22:04:57 2008  Arnold D. Robbins  <address@hidden>

        * builtin.c (format_tree): Case for 's', improve logic for setting
        the number of characters to copy, also at pr_tail. Based on
        bug report by Hermann Peifer <address@hidden>.

Index: builtin.c
===================================================================
RCS file: /d/mongo/cvsrep/gawk-stable/builtin.c,v
retrieving revision 1.28
diff -u -r1.28 builtin.c
--- builtin.c   11 Dec 2008 19:38:27 -0000      1.28
+++ builtin.c   29 Dec 2008 20:02:54 -0000
@@ -985,7 +985,9 @@
                        parse_next_arg();
                        arg = force_string(arg);
                        char_count = mbc_char_count(arg->stptr, arg->stlen);
-                       if (! have_prec || prec > char_count)
+                       if (fw == 0 && ! have_prec)
+                               prec = arg->stlen;
+                       else if (! have_prec || prec > char_count)
                                prec = char_count;
                        cp = arg->stptr;
                        goto pr_tail;
@@ -1196,7 +1198,9 @@
                                }
                        }
                        copy_count = prec;
-                       if (gawk_mb_cur_max > 1 && (cs1 == 's' || cs1 == 'c')) {
+                       if (fw == 0 && ! have_prec)
+                               ;
+                       else if (gawk_mb_cur_max > 1 && (cs1 == 's' || cs1 == 
'c')) {
                                assert(cp == arg->stptr || cp == cpbuf);
                                copy_count = mbc_byte_count(arg->stptr,
                                                cs1 == 's' ? arg->stlen : 1);





reply via email to

[Prev in Thread] Current Thread [Next in Thread]