bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coreutils 5.0.1: spurious error from uniq


From: Paul Eggert
Subject: Re: Coreutils 5.0.1: spurious error from uniq
Date: 17 Jul 2003 09:48:41 -0700
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3

Andreas Schwab <address@hidden> writes:

> $ echo '+:::::
> +:::::' | uniq -u
> uniq: string comparison failed: No such file or directory
> uniq: Set LC_ALL='C' to work around the problem.
> uniq: The strings compared were `+:::::' and `+:::::'.

Ouch!  It sounds like your strcoll is broken.  GNU 'ls' has a
workaround for broken strcoll (basically: resort the directory
directory) but this workaround isn't appropriate for programs like
'uniq' and 'sort', since they can't easily undo the work they've
already done.

I guess we should fix coreutils so that it detects broken strcoll at
configure-time, and refuses to use strcoll at all if it is broken.
Likewise for other GNU programs.

We need a test program for this.  I cannot reproduce the problem on my
GNU/Linux host, under any of the locales it has installed.  What
locale were you using when you ran into the problem?


> The whole error checking in memcoll and xmemcoll is completely bogus.  The
> C standard says in 7.5#3:

It's not bogus, since it is relying on POSIX.  See
<http://www.opengroup.org/onlinepubs/007904975/functions/strcoll.html>,
which says:

    Since no return value is reserved to indicate an error, an
    application wishing to check for error situations should set errno
    to 0, then call strcoll(), then check errno.... The strcoll()
    function may fail if:

    [EINVAL] The s1 or s2 arguments contain characters outside the
      domain of the collating sequence.

The underlying problem here is: what should 'sort', 'uniq', 'ls', etc.
do when strcoll returns bogus results?  They can't just take the
bogus results and continue, since that can lead to real problems.
For example, if you use strcoll as the underlying comparison function
to qsort, and if strcoll fails, then strcoll is no longer a total order
and qsort is allowed to dump core (and indeed does dump core, on some
platforms).

This problem also comes up with alphasort, which is a companion to
scandir in glibc and is being proposed as a POSIX extension.  The
problem is that scandir+alphasort can dump core when the directory has
file names with names that don't compare (e.g., due to encoding errors).
This is unacceptable.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]