lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] First experience with std::regex from gcc 11 and CTRE in test_


From: Vadim Zeitlin
Subject: Re: [lmi] First experience with std::regex from gcc 11 and CTRE in test_coding_rules
Date: Thu, 3 Jun 2021 02:35:54 +0200

On Wed, 2 Jun 2021 22:38:09 +0000 Greg Chicares <gchicares@sbcglobal.net> wrote:

GC> On 6/2/21 8:56 PM, Vadim Zeitlin wrote:
GC> [...]
GC> >  So far I have done some changes to Boost.Regex version in preparation for
GC> > the switch to std::regex, notably including removing the statistics 
display
GC> > (which is now done by "make check_concinnity" only) and splitting the 
regex
GC> > in check_include_guards() containing ".*" in the middle in the start and
GC> > end parts as using ".*" with std::regex results in a crash due to a stack
GC> > overflow (I usually try to be understanding and forgiving with other
GC> > people, especially in writing, but here I have no idea what could the
GC> > author of this code been thinking when they decided to implement "*"
GC> > quantifier using recursion as this is a painfully obviously horrible idea)
GC> 
GC> What library has this recursive Kleene-star implementation--the
GC> libstdc++ version corresponding to gcc version eleven?

 Yes, which is why it runs out of stack provided that the input is long
enough.

GC> >  After removing the statistics display I could also easily test running
GC> > "parallel -m test_coding_rules ::: *" instead of running it directly. This
GC> > results in a big improvement (~3.5 times faster) for me, but not as big as
GC> > might be expected on a 8 CPU machine where I've been testing this so far,
GC> > so there is definitely some overhead due to using GNU parallel and I'd
GC> > expect an even bigger speed up if we implemented the parallelism
GC> > internally.
GC> 
GC> Okay, as expected--an incremental improvement, and an expectation of
GC> further incremental improvement--but a revolutionary change is wanted.

 OTOH you could say that with parallel, the std::regex version is almost
twice faster than the Boost.Regex version without it. Of course, I already
find the current version too slow, so I'd still like to make it even faster
but with parallel even std::regex is not unbearably slow any more.

GC> Assuming the gains persist, is half of boost's speed good enough?

 No, it takes almost 20s on my (slow) Linux test machine and this is
definitely too long. Of course, 10s for the Boost.Regex version is not
great neither. And, yes, I know that I need to update this box, it's just
that it has been working so well for so many years that I can't bring
myself to do it...

GC> >  Next I tried using SRELL (http://www.akenotsuki.com/misc/srell/en/), 
which
GC> > is a library I've found only recently, and which looked appealing because
GC> > it's supposed to provide exactly the same interface as std::regex, so
GC> > testing with it should have been very simple.
GC> 
GC> That looks like a single person's project, and we'd have to
GC> consider whether its rejection:
GC>   http://www.akenotsuki.com/misc/srell/en/proposal/
GC> might cause him to lose interest in maintaining it.
GC> 
GC> I couldn't tell what regex engine he uses. If he had used
GC> PCRE, he'd achieve total world domination.

 I mostly hoped that it would be a drop in replacement showing how much
speed could we gain without changing anything at all at the code level. But
it didn't quite work out like that.

GC> Yes, especially the run-time speed. If, say, it's (surprisingly)
GC> not faster than gcc-11's std::regex, then we'd drop CTRE.

 It's definitely faster, I just don't know by how much exactly yet. After
using it in enforce_taboos() only I'm roughly back to Boost.Regex speed.

GC> Perhaps focusing on those three functions only, or even on just
GC> one of them, would tell us how fast CTRE might be.

 Yes, I'll try to finish it during the next couple of days.

GC> >  And I'd also still like to try using PCRE. It's practically the gold
GC> > standard in this domain and everybody compares themselves with it, so it
GC> > looks like we should at least check how does it work for us.
GC> 
GC> At first, I was thinking this would mean writing a C++ wrapper
GC> for PCRE,

 FWIW PCRE has its own C++ API, see e.g.

        https://man7.org/linux/man-pages/man3/pcrecpp.3.html

I've never used it in anger and it does look like that good at first glance
(doesn't seem to use exceptions, no traces of even C++11, ...), but I guess
it should still be usable.

 Regards,
VZ

Attachment: pgpeZC76W_mZu.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]