lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Horrible std::regex performance


From: Vadim Zeitlin
Subject: Re: [lmi] Horrible std::regex performance
Date: Sat, 16 Jul 2016 02:12:51 +0200

On Fri, 15 Jul 2016 23:41:03 +0000 Greg Chicares <address@hidden> wrote:

GC> Okay, that idea--running 'test_coding_rules' in parallel--doesn't seem
GC> good enough. Thirty seconds under WINE is better than ninety-three, but
GC> much worse than one point two:

 Can't disagree with this. I'm still amazed that the MSW version running
under WINE is (significantly!) faster than the native one, it's the first
time I've ever seen this.

GC> > Platform     Compiler   boost::regex        std::regex     PCRE
GC> > 
----------------------------------------------------------------------------
GC> > Linux         gcc 4.9            5.9              27.7      1.9
GC> > WINE          gcc 4.9            1.2              93.5
GC> 
GC> I really have to stop using this Cygwin VM and move everything to debian.
GC> I'm not sure whether I'll want to run this program in WINE or in native
GC> GNU/Linux, but either way the performance with std::regex is too poor.

 Yes.

GC> Besides, I'm not sure GNU parallel will do the right thing. I think
GC> that would require...
GC> 
GC> http://lists.nongnu.org/archive/html/lmi/2016-07/msg00011.html
GC> | restructuring 'test_coding_rules.cpp', notably because it summarizes
GC> | statistics for all files, so it might be more attractive to parallelize
GC> | that file itself, e.g. by threading.
GC> 
GC> But I can't readily see that for myself, because I don't have GNU
GC> parallel installed in Cygwin here, and I really don't want to upgrade
GC> Cygwin at this moment. Please tell me if I'm missing something and
GC> GNU parallel would magically get the summary statistics right.

 It does output the results of each command together, i.e. it is smart
enough to avoid interleaving their output. But it doesn't take care of the
summaries, you'd still get N messages about "1 source files", for example,
instead of one message about "N source files" -- which is why I wrote that
test_coding_rules.cpp would still need to be modified.

GC> What does the PCRE measurement above mean? Did you translate
GC> 'test_coding_rules.cpp' to use the PCRE C API instead of C++ regex?

 Yes, exactly. AFAICS I did it correctly, although maybe not in the most
efficient way as I mostly tried to keep the changes as small as possible.
I'm almost sure PCRE performance could be improved further with some effort
(OTOH I'm half sure that xxx::regex performance might also be improved,
processing ~170000 lines in even 1 second is not really something to be
proud about on modern hardware).

 Please let me know if you have any other questions and/or would like to
see any other benchmarks, e.g. PCRE under MSW.

 Thanks,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]