[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: win32 diff (GNU diffutils) 2.8.1 "--ignore-file-name-case" switch do
From: |
Paul Eggert |
Subject: |
Re: win32 diff (GNU diffutils) 2.8.1 "--ignore-file-name-case" switch doesn't work |
Date: |
11 Jan 2004 22:01:37 -0800 |
User-agent: |
Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 |
"Eli Zaretskii" <address@hidden> writes:
> > From: Paul Eggert <address@hidden>
> > Date: 10 Jan 2004 23:18:44 -0800
> >
> > What is actually needed is a "strcasecoll" routine, which compares
> > file names in a case-insensitive way in arbitrary locales.
>
> Right, but as long as such a beast isn't available, isn't it more
> correct to return zero if strcasecmp finds the names equal?
(Hmm, "more correct"? :-)
I don't know what strcasecmp does in non-"C" locales, so it's hard for
me to say. Perhaps it handles simple unibyte locales correctly, but
perhaps not. (In Solaris 9, it doesn't: it assumes ASCII.) Either
way, I suspect strcasecmp invariably mishandles multibyte locales, so
in those cases it's not correct.
> What situation do you envision where two strings that are equal for
> strcasecmp are not equal in a non-Posix locale?
That could happen in Shift-JIS locales, since the second byte of a
Shift-JIS character might look like a valid ASCII letter.
> > For a similar case I've been thinking of using the patch described in
> > <http://mail.gnu.org/archive/html/bug-gnu-utils/2002-12/msg00079.html>.
> > However, given your discussion it seems that this patch isn't correct
> > either....
>
> Hmm.. why not?
Because that patch falls back on strcasecmp/file_name_cmp if
strcasecoll/strcoll returns zero. That's not correct: if strcasecoll
returns zero, compare_names should return zero without falling back on
strcasecmp/file_name_cmp.
> As the last resort, how about supporting case-insensitive file-name
> comparisons only in the C locale?
Yes, that's pretty much the best we can do, and it was the intent of
that patch.
> For that matter, why not using strcasecmp alone when we run under
> the C locale?
That goes too far, I think. For example, traditional strcasecmp does
something useful in the en_US.utf8 locale: it ignores ASCII case even
if it doesn't ignore case on non-ASCII letters. (POSIX says the
result of strcasecmp is unspecified outside the POSIX locale, but
that's OK.)