bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ^ in FS


From: Stepan Kasal
Subject: Re: ^ in FS
Date: Tue, 25 Nov 2008 19:06:47 +0100
User-agent: Mutt/1.5.18 (2008-05-17)

Hello,

this is an incomplete answer to your mail.

> I'm having trouble in understanding the behavior of ^ in FS, [...]

It's a simple consequence of the straightforward implementation.
One example is worth 1000 words:

$ echo 'XXf1 , f2, XXf3' | awk -v FS='^X+| *, *' \
        '{for(i=1;i<=NF;i++)print "-->"$i"<--"}'
--><--
-->f1<--
-->f2<--
--><--
-->f3<--

After the third field ("f2") has been found, awk moves past it and
its delimiter (", "), so the remaining string is "XXf3".
That string is passed to the regexp matcher.  Since the matcher is
not told we are not at the beginnig of the string, it finds "XX",
which delimits the empty "fourth field".

Now to your exmaple:
> $ echo '  f1 ,  f2,f3  ,  f  4  ,f5' | awk -v FS='^ *| *, *'
> '{for(i=1;i<=NF;i++)print "-->"$i"<--"}'
> --><--
> -->f1<--
> -->f2<--
> -->f3<--
> -->f<--
> -->4<--
> -->f5<--

The differece is that your regexp can match empty string at the
beginning.  So after fourth field and its delimiter has been removed,
when we have "f  4  ,f5" the answer from the matcher is "empty string
at the beginnnig".  But that cannot be a valid delimiter, so gawk
skips one char and calls the matcher again on "  4  ,f5" .
Now, the delimiter is "  " so it is not empty, and it is taken as the
delimiter.

> don't know whether it can be called "bug".

I'm afraid there are two bugs involved:
gawk does not tell the matcher
1) that empty matches should be ignored (skipping one char to get
past them is a kludge)
2) that it is not at the beginning of the string

But I'm afraid the regexp matcher(s) insige gawk can handle those
features.  Consequently, the bugs are probably hard to fix.

Stay tuned, better answers may come later... ;-)

Have a nice day,
        Stepan Kasal




reply via email to

[Prev in Thread] Current Thread [Next in Thread]