[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] bug in gawk
From: |
arnold |
Subject: |
Re: [bug-gawk] bug in gawk |
Date: |
Sun, 07 Apr 2019 00:02:18 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Hi.
Aleksey Cheusov <address@hidden> wrote:
> 05.04.2019, 10:51, "address@hidden" <address@hidden>:
> > Hi.
> >
> >> ??Aleksey Cheusov <address@hidden> wrote:
> >>
> >> ??> 0 0 dictd>echo a | env LC_ALL=C gawk '/^[\300-\337]/ {print 1}'
> >> ??> gawk: cmd. line:1: error: Invalid range end: /^[??-??]/
> >>
> >> ??I have reproduced this. It's strange. I will investigate further.
> >
> > I have found the cause of the problem. I have to think a little
> > bit about how to fix it.
> >
> > I should have a fix within a few days.
> >
> > Thank you for the report!
>
> Great! You are one of the best upstream I've ever seen :-)
Thanks. I try.
Here is the diff. I will get this into the git repo sometime in
the next few days, but this will let you move ahead.
Arnold
---------------------------------------------------------
diff --git a/eval.c b/eval.c
index 4650150..132c850 100644
--- a/eval.c
+++ b/eval.c
@@ -104,6 +104,12 @@ char casetable[] = {
'\170', '\171', '\172', '\173', '\174', '\175', '\176', '\177',
/* Latin 1: */
+ /*
+ * 4/2019: This is now overridden; in single byte locales
+ * we call load_casetable from main and it fills in the values
+ * based on the current locale. In particular, we want LC_ALL=C
+ * to work correctly for values >= 0200.
+ */
C('\200'), C('\201'), C('\202'), C('\203'), C('\204'), C('\205'),
C('\206'), C('\207'),
C('\210'), C('\211'), C('\212'), C('\213'), C('\214'), C('\215'),
C('\216'), C('\217'),
C('\220'), C('\221'), C('\222'), C('\223'), C('\224'), C('\225'),
C('\226'), C('\227'),
@@ -201,18 +207,12 @@ load_casetable(void)
{
#if defined(LC_CTYPE)
int i;
- char *cp;
static bool loaded = false;
if (loaded || do_traditional)
return;
loaded = true;
- cp = setlocale(LC_CTYPE, NULL);
-
- /* this is not per standard, but it's pretty safe */
- if (cp == NULL || strcmp(cp, "C") == 0 || strcmp(cp, "POSIX") == 0)
- return;
#ifndef USE_EBCDIC
/* use of isalpha is ok here (see is_alpha in awkgram.y) */
@@ -710,7 +710,7 @@ set_IGNORECASE()
warned = true;
lintwarn(_("`IGNORECASE' is a gawk extension"));
}
- load_casetable();
+
if (do_traditional)
IGNORECASE = false;
else
diff --git a/main.c b/main.c
index e2bcd72..d6e3426 100644
--- a/main.c
+++ b/main.c
@@ -320,6 +320,10 @@ main(int argc, char **argv)
/* init the cache for checking bytes if they're characters */
init_btowc_cache();
+ /* set up the single byte case table */
+ if (gawk_mb_cur_max == 1)
+ load_casetable();
+
if (do_nostalgia)
nostalgia();