bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: find (and friends) bug?


From: Tom Lord
Subject: Re: find (and friends) bug?
Date: Thu, 14 Mar 2002 06:23:56 -0800 (PST)

        From: Ulrich Drepper <address@hidden>
        We have a new [regexp] implementation now.


I've now had a chance to review this implementation in more detail and
I am surprised that you have chosen to merge it into glibc at its
current stage of development.  

Yes, it has some desirable features, such as support for
multi-character collating sequences and character equivalence sets.
On the other hand, it has some interesting shortcomings.

First, it does not appear to come close to implementing `regexec'
correctly.  I have previously offered you a test program that points
this out, but received no response.  The algorithm used by the matcher
is based (in the case of patterns with no backreferences) on a 2-pass
scan of the input string (it is O(N) where N is the length of the
string), yet it is widely excepted among past implementors of
`regexec' that no O(N) algorithm is possible -- even a casual (though
informed) reading of the code would have revealed that it has to
undergo significant revision.  I've communicated with the author of
the new matcher and confirmed that, indeed, it does not pass the test
suite I offered you.

Second, the algorithm has a space performance that is O(N) on the
length of the input string.  This is completely unprecedented among
implementations and is likely to make it ill-suited for many
reasonable uses.

Third, the algorithm lazilly constructs a DFA but has no provision for
discarding states during a match when the DFA grows too large.  This
too will give it unreasonable space performance for many reasonable
uses.

So I'm beginning to wonder about the process by which new code is
added to the library.  Offers for improved testing infrastructure are
ignored and appropriate review is apparently neither solicited or
performed.  While there are two alternative code bases from which a
matcher suitable for glibc might be derived, it appears that neither
of these was considered or evaluated.

I fear that the problems with the implementation will be hand-waved
away, that it will appear in distributions, and that people will
develop new code dependent on its buggy behavior.

Please restore my faith in the quality of GNU software.

-t



reply via email to

[Prev in Thread] Current Thread [Next in Thread]