lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Horrible std::regex performance


From: Greg Chicares
Subject: Re: [lmi] Horrible std::regex performance
Date: Mon, 11 Jul 2016 22:18:23 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.8.0

On 2016-07-10 15:00, Vadim Zeitlin wrote:
> 
>  I'm embarrassed to say that after spending quite some time on replacing
> boost::regex with std::regex in lmi I realized that the performance of the
> latter is absolutely horrendous relatively to the former. The tests in
> regex_test.cpp don't make much sense from the point of view of matching
> multiline strings because '.' never/always matches the new line in the
> default ECMAScript/any of POSIX regex syntaxes, but their timings are still
> instructive: in my Windows 7 VM using g++ 4.9.1 I measured the following:
[...snip measurements...]
>  So there is a worse than 10 *times* slowdown and "optimize" flag doesn't
> help at all (not unexpectedly, with such extremely simple regexes). I
> really don't know what were libstdc++ developers thinking and why couldn't
> they adapt the existing Boost.Regex code, but in practice it's clear that
> std::regex must not be used for anything remotely performance-sensitive
> (notice that Boost.Regex was already known to be quite slow, e.g. PCRE is
> significantly faster).

Those timings come from lmi's 'regex_test.cpp', whose tests are artificial.
We already use std::regex in 'test_main.cpp', so boost::regex is used only
in these places:

$grep boost/regex *.?pp

regex_test.cpp:#include <boost/regex.hpp>

test_coding_rules.cpp:#include <boost/regex.hpp>

wx_test_about_version.cpp:#include <boost/regex.hpp>


The only one where speed matters is 'test_coding_rules.cpp', and there it
does matter very much. Would it be easy for you to time that by measuring
the 'check_concinnity' target with both regex implementations?

Even if this makefile line:
        @-$(TEST_CODING_RULES) *
is a bottleneck, perhaps it could be parallelized. Alternatively, we
already write sentinel files like 'skeleton.hpp.physical_closure' to
avoid repeating costly tests on source files that haven't changed, and
perhaps we could do the same sort of thing for the 'check_concinnity'
recipe. But those ideas have the disadvantage that they'd require
restructuring 'test_coding_rules.cpp', notably because it summarizes
statistics for all files, so it might be more attractive to parallelize
that file itself, e.g. by threading.

>  I have to admit that I don't really know how to proceed from here. I can
> finish my patches and submit them, but do we really want to apply them
> considering the benchmark results above? Is the convenience of not having
> to build Boost.Regex worth making regex matching ~15 times slower? Or is it
> still worth finishing the patches even if they're not going to be applied
> just to keep them for the future when libstdc++ implementation hopefully
> becomes less awful? I could test std::regex performance with g++-6, should
> I do this?

I don't think we can pick the best path forward without measuring the
speed of 'make check_concinnity'. We might find that
        @-$(TEST_CODING_RULES) *
is not the slowest line in its recipe, so maybe a decimal order of
magnitude slowdown wouldn't mean very much. If (as we expect) it does
make a noticeable difference, then maybe 'test_coding_rules.cpp' should
be rewritten to use threading.

MinGW-w64 does seem to offer gcc-6.10, but their sourceforge site is so
painful to navigate that I can't easily tell whether they have it in the
32-bit sjlj flavor we want. However, we ardently desire to move all
development to GNU/Linux, and debian's cross-compiler offerings
  https://packages.debian.org/search?keywords=mingw-w64
are limited to 4.9.1 stable and 5.4.0 unstable, and upgrading from the
4.9.1 we use today to 5.4 doesn't seem attractive--with the new gcc
version-numbering scheme, that's not a big jump.

I tried searching the web, and didn't immediately turn up anything that
would suggest the developers are aware of this speed problem. I find
nothing relevant at
  https://gcc.gnu.org/bugzilla/buglist.cgi?quicksearch=regex
either. It looks like they have more urgent problems with std::regex
anyway, and it doesn't seem wise to hope this problem will go away
anytime soon.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]