bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FIELDWIDTHS broken in awk 3.1.5


From: Glenn Zazulia
Subject: Re: FIELDWIDTHS broken in awk 3.1.5
Date: Wed, 25 Jan 2006 20:01:18 -0700
User-agent: Thunderbird 1.5 (X11/20051201)

Aharon,

I found your FIELDWIDTHS patch which fixes a problem that I also noticed with the 3.1.5 gawk release.

I also noticed a related possible problem with FIELDWIDTHS in the gawk 3.1.5 release. In gawk 3.1.4 and earlier, the statement

   FIELDWIDTHS = ""

used to work -- that is, it was executed without error and a subsequent "print FIELDWIDTHS" statement showed that any previous value was unset. Running that statement through gawk 3.1.5 produces the following fatal error:

   gawk: (FILENAME=- FNR=1) fatal: invalid FIELDWIDTHS value, near `'

Now, one might question what that statement should do, and one might argue that instead of being a new bug, this might be a fix of previously improper behavior. I don't think so. Since the FIELDWIDTHS variable is a gawk extension and since the documentation doesn't state what the behavior should be in this case, it's difficult for me to argue one way or the other. However, this change in behavior in 3.1.5 breaks existing gawk scripts, which is a problem.

Let me illustrate with an existing script segment:

   ...
   line = $0;
   FIELDWIDTHS = "2 2 2 2";
   $0 = hexmask;
   ddecmask = "";
   for (i = 1; i < NF; i++)
           ddecmask = ddecmask strtonum("0x" $i) ".";
   ddecmask = ddecmask strtonum("0x" $NF);
   FIELDWIDTHS = "";
   FS = FS;
   $0 = line;
   ...

Notice that this script temporarily switches to fixed field mode to process a particular value, and then it switches back to regular variable field-separator mode for further input processing. Granted, there are alternative equivalent methods that could avoid using the FIELDWIDTHS mechanism entirely, but that's besides the point. This code segment used to work properly in previous gawk releases and now breaks.

Honestly, the 'FIELDWIDTHS = ""' statement is actually unimportant in this case since the actual statement that restores the default record splitting behavior is the subsequent "FS = FS" statement. So, for the time-being, I have removed the statement that clears out FIELDWIDTHS variable in order that the script work with the latest gawk release. However, I had previously cleared it out as an extra precaution and to avoid confusion.

[The fact that FS and FIELDWIDTHS could both be set simultaneously is the confusion that I was trying to avoid. I think this issue stems from the design decision that setting either variable both defines the field separator and (possibly) switches record splitting modes. That's not a big deal, though, and I don't mean to go off on that tangent.]

So, let me return to the question of whether 'FIELDWIDTHS = ""' should be a legal statement. Again, since this is an extension with no standards to follow and lacking any documentation that states otherwise, I would argue that this statement should not suddenly cause a fatal error because it never used to do so and the previous behavior did not seem to be problematic -- even if not apparently useful. Notice the following behavior in gawk 3.1.4 and earlier:

   $ echo 'ab cd' | gawk '{ print NF, $1, $2 }'
   2 ab cd
   $ echo 'ab cd' | gawk 'BEGIN { FS = "" } { print NF, $1, $2 }'
   5 a b
   $ echo 'ab cd' | gawk 'BEGIN { FIELDWIDTHS = "" } { print NF, $1, $2 }'
   0

I see where in the source code to make the fix to restore the previous behavior, but I wasn't going to bother creating a proposed patch if you weren't convinced that the changed behavior needs to be fixed.

Please let me know.

Glenn


reply via email to

[Prev in Thread] Current Thread [Next in Thread]