[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
\w+ in gawk 3.1.3 - fix
From: |
Aharon Robbins |
Subject: |
\w+ in gawk 3.1.3 - fix |
Date: |
Wed, 14 Jan 2004 11:37:07 +0200 |
Greeetings. Re this:
> From: Ron Burk <address@hidden>
> To: <address@hidden>
> Date: Tue, 13 Jan 2004 20:13:01 -0800
>
> Gawk 3.1.3, compiled with gcc 2.96 on RedHat Linux 7.2
>
> Based on the documentation,
> found it confusing that "\w+" does not match identifiers
> containing digits. Specifically, this pattern:
>
> /^\w+$/
>
> will match a line containing only "func", but not
> one containing "func2". Since the documentation
> implies a similarity between "\w" and "[[:alnum:]_]",
> I also find it confusing that this pattern:
>
> /^[[:alnum:]_]+$/
>
> *does* match a line containing either "func" or
> "func2".
>
> If the documentation is correct in claiming that
> "\w" is *supposed* to also match digits, not
> just alphabetics and "_", then this seems like
> a bug. I went to the source code and in the
> "\w" code that pokes in a "_" to the bitset,
> I added similar pokes for '0' through '9'.
> That seemed to produce the documented behavior,
> though I'm sure it is unlikely to be the best
> way to fix it.
>
> Thanks.
Here is the promised fix for 3.1.3.
Thanks!
Arnold
-----------------------------------------------
--- ../gawk-3.1.3/regcomp.c 2003-03-11 11:42:51.000000000 +0200
+++ regcomp.c 2004-01-14 11:35:11.000000000 +0200
@@ -3343,7 +3343,7 @@
#ifdef RE_ENABLE_I18N
mbcset, &alloc,
#endif /* RE_ENABLE_I18N */
- (const unsigned char *) "alpha", 0);
+ (const unsigned char *) "alnum", 0);
if (BE (ret != REG_NOERROR, 0))
{
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- \w+ in gawk 3.1.3 - fix,
Aharon Robbins <=