[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ^ in FS
From: |
Stepan Kasal |
Subject: |
Re: ^ in FS |
Date: |
Wed, 26 Nov 2008 13:25:40 +0100 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
Hello,
On Tue, Nov 25, 2008 at 08:29:04PM +0100, Dave B wrote:
> This is actually another good example. [...]
> GNU awk 3.1.6: [...] (imho wrong, [...])
> Bell labs' original awk: [...] (imho correct)
agreed.
> However, I can't find the precise circumstances [...]
I'm afraid I have not selected the best examples; thank you for
sending your ones. Asking the right question is bigger part of the
explanation. ;-)
> $ echo 'XXf1XXf2XXf3' | awk -v FS='^X+' '{for(i=1;i<=NF;i++)print
> "-->"$i"<--"}'
> --><--
> -->f1XXf2XXf3<--
- FS regex is matched against "XXf1XXf2XXf3"; the result is the "XX"
at the beginning.
- The first field ("") and delimiter ("XX") are stripped.
- FS regex is matched against the remainder ("f1XXf2XXf3"); no match.
- hence the whole string becomes $2
> $ echo 'XXf1XXf2XXf3' | gawk -v FS='^X+|k*' '{for(i=1;i<=NF;i++)print
> "-->"$i"<--"}'
> --><--
> -->f1<--
> -->f2<--
> -->f3<--
- FS regex is matched against "XXf1XXf2XXf3"; the result is the "XX"
at the beginning.
- The first field ("") and delimiter ("XX") are stripped.
- FS regex is matched against the remainder ("f1XXf2XXf3"); the
leftmost longest match is the empty string at position 0.
- But empty delimiter is not allowed, so it is dismissed and the FS
regex is matched against "1XXf2XXf3" (one char skipped);
again, the match is the empty string at position 0.
- But empty delimiter is not allowed, so it is dismissed and the FS
regex is matched against "XXf2XXf3" (one char skipped);
the leftmost longest match is "XX".
- Consequently, "f1" becomes $2.
- $2 and the delimiter ("XX") get stripped.
- FS regex is matched against the remainder ("f2XXf3");
etc.
I hope this explains _what_ gawk does.
A note: the "skip one char" step is correct is the regex does not
use ^, word boundary esacpes and such. In that case, if the leftmost
longest match is empty, there cannot be longer match from the
beginning, so we are searching for a match at the next position.
So if you are not using ^ or word boundary escapes in FS, gawk's
field splitting is correct.
Have a nice day,
Stepan Kasal