[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug in GNU gawk - matching initial space in RE
From: |
barry |
Subject: |
Re: Bug in GNU gawk - matching initial space in RE |
Date: |
Wed, 17 Sep 2003 08:53:53 -0400 (EDT) |
My thanks to both Arnold and Stepan for enlightenment. I will put in a
P.O. for "Effective AWK Programming" as soon as I get into work this morning.
I will also remove this construct from my code - I naively thought that an
escape "\" before a character would at best keep ot from being interpreted
as a special character and at worst be ignored. I thought it likely that
"<" could be misinterpreted as an I/O redirection unless escaped. My bad.
Best Regards,
Barry Zeeberg
Date: Wed, 17 Sep 2003 13:28:41 +0300
From: Aharon Robbins <address@hidden>
To: address@hidden
Subject: Re: Bug in GNU gawk - matching initial space in RE
Documetation is in gawk.texi in the gawk doc/ directory. You can find
it online off of gnu.org somewhere, start at the main page. You can
also buy "Effective AWK Programming", 3rd edition from O'Reilly, and
put a (very) few $$ in my pocket. :-)
Enjoy,
Arnold
********************************
On Wed, 17 Sep 2003, Stepan Kasal wrote:
> Hello,
>
> On Tue, Sep 16, 2003 at 01:28:56PM -0400, Barry Zeeberg wrote:
> > gawk '{if($0 ~ /^ \<biological_process ;/)print $0}' gawk.bug.textfile
> >
> > ==> gawk.bug.textfile <==
> > <biological_process ; GO:0008150
>
> [...]
>
> the problem is that you use \< instead of mere < .
>
> You meant to use /^ <bio.../ (or /^ [<]bio.../ if you prefer it).
>
> According to POSIX, awk regular expressions are derived from so called
> "extended regular expressions", ERE's. In ERE's, \< is in an ordinary
> character, preceded by backslash, which yields undefined behaviour,
> says POSIX.
>
> Thus the behaviour is undefined by POSIX; some implementation may
> consider \< to be the same as <, while GNU awk takes it as a word
> boundary.
>
> Hope this explains it,
> Stepan Kasal
>