[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Best way to integrate PCRE
From: |
Greg Chicares |
Subject: |
Re: [lmi] Best way to integrate PCRE |
Date: |
Fri, 27 Aug 2021 16:21:44 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 |
On 8/21/21 6:37 PM, Vadim Zeitlin wrote:
> On Wed, 28 Jul 2021 19:45:30 +0000 Greg Chicares <gchicares@sbcglobal.net>
> wrote:
>
> GC> On 2021-07-28 18:45, Vadim Zeitlin wrote:
> GC> > On Wed, 28 Jul 2021 16:10:34 +0000 Greg Chicares
> <gchicares@sbcglobal.net> wrote:
> GC>
> GC> [...simply withdraw support for msw builds of
> 'test_coding_rules{EXE_EXT}'...]
> GC>
> GC> > Note that this will also require making parts of makefiles conditional
> on
> GC> > the target host, which wasn't the case so far.
[...]
> GC> That doesn't change the decision, strategically. Tactically, I'd favor
> GC> a minimalist approach along the lines of this untested patch:
> GC>
> GC> --------8<--------8<--------8<--------8<--------8<--------
[...abbreviated]
> GC> + ifeq (x86_64-pc-linux-gnu,$(LMI_TRIPLET))
> GC> + default_targets += \
> GC> + test_coding_rules$(EXEEXT) \
> GC> +
> GC> + endif
> GC> -------->8-------->8-------->8-------->8-------->8--------
>
> Thanks for this idea! I've followed it initially but, after much
> toing-and-froing, finally decided to use a different approach: I still
> always build test_coding_rules${EXEEXT}, but it just outputs an error if
> it's built under MSW.
>
> I believe this is the best solution because
>
> - It results in more clear error under MSW if something attempts to run
> test_coding_rules there.
>
> - It avoids any possibility of breaking something else because
> test_coding_rules executable is not available when it's supposed to be.
>
> - It still allows to build the fully functional version under MSW by simply
> predefining LMI_HAS_PCRE if you do have PCRE library there.
>
> but please let me know if you disagree before I submit my patch.
I'm not as close to this as you are, so it's hard for me to say
whether one strategy or another is better. But I can speak in
terms of outcomes. For instance:
> GC> We'd also need to change 'hooks/pre-commit', e.g. (untested):
>
> With my solution above this is not necessary any more as it would run and
> just output an error under MSW.
Then I guess it might simply emit a message like:
"'test_coding_rules' not implemented for msw."
That's okay with me if I deliberately try to run it under msw.
But of course I wouldn't want that outcome otherwise. Let me
try to say that more precisely:
(1) In this scenario:
$ LMI_COMPILER=gcc ; LMI_TRIPLET=i686-w64-mingw32 ; .
/opt/lmi/src/lmi/set_toolchain.sh
$ $make test_coding_rules.exe
if I get a useless binary that only says "not implemented",
that's okay.
(2) In a scenario where I may have done this several months ago:
$ LMI_COMPILER=gcc ; LMI_TRIPLET=i686-w64-mingw32 ; .
/opt/lmi/src/lmi/set_toolchain.sh
and then, having forgotten about that, I do this:
$ make $coefficiency check_concinnity 2>&1 |less -S -N
then I want a concinnity check, not a "not implemented" message.
How could that be accomplished? IIRC, years ago we did that for
$(XMLLINT). Grepping for that, I find ('grep' output indented):
posix_fhs.make:XMLLINT := xmllint
pc-linux-gnu: Just use the system binary.
msw_cygwin.make:XMLLINT := $(localbindir)/xmllint
cygwin: Use a locally-built binary. The idea was that we always
build libxml anyway, obtaining this as a convenient by-product;
but we either don't have a system binary for it, or that system
binary might be out of date.
msw_generic.make:# XMLLINT := $(PERFORM) $(localbindir)/xmllint
msw_generic.make: XMLLINT := xmllint
"generic" (meaning 'wine', I guess): We have an architecture-specific
build already, but we choose not to use it--IIRC, because the debian
binary is faster.
Just now I looked for "architecture-independent" in FHS-3.0 (FWIW)
and it looks like only /usr/share and /usr/local/share are
designated that way--but they seem to be intended for data.
As for binaries, if I type
$ grep
I expect to get the operating system's 'grep', say, '/bin/grep';
that's the kind of behavior I want to achieve--i.e., if I type
$ make $coefficiency check_concinnity
then I just want a concinnity test: if it's identical for all
architectures, that's okay, because its outcome shouldn't vary by
architecture anyway. I guess the place to install that is
$(prefix)/bin
unless we want to add a new directory for such things, e.g.
$(prefix)/sbin
for binaries built with one and only one "dominant" architecture.
(If we like both of those, I'd actually prefer $(prefix)/sbin/
because it wouldn't be obliterated by 'make uninstall'. It could
be objected that 'sbindir' should be the architecture-dependent
$(exec_prefix)/sbin
to which I would reply that we could use an idiosyncratic
directory, say:
# dominant-architecture binaries to be used always
xbindir := $(prefix)/xbin
which we'd add to $PATH in all applicable scripts and makefiles.)
> Finally, the real reason for writing this email, is that I've realized I
> forgot to ask another question: what do you think should be done with
> regex_test.cpp?
Retain it, removing boost options with no standard parallels.
> In my current version, it was changed to use std::regex
> instead of boost::regex and while I could just leave it like this, I don't
> think it makes much sense to have unit tests for the functionality provided
> by the standard library.
It's more than that. For example, test_input_sequence_regex() is captioned:
/// Motivation: to validate data from external systems. To facilitate
/// maintenance of xml schemata, a regex is constructed and displayed
/// for every lmi sequence type.
Other tests demonstrate
/// the high cost of line-by-line searching in a vector of strings
for std::string::find() vs. regex searches for text that occurs early,
late, and never in a sizable string. This is useful information.
Other code references these tests:
/// Add a newline at the beginning of the string, and require
/// a newline at the end, so that "\n" can be used in regexen
/// instead of '^' and '$' anchors--see 'regex_test.cpp'.
and we don't want to leave those references dangling. Generally,
I prefer to leave well enough alone.
> And while it did make sense to compare Boost.Regex
> performance with different combinations of "m" and "s" modifiers, such
> modifiers are not supported by std::regex, so these tests don't make much
> sense neither.
Apparently boost implemented "m" and "s" but std::regex did not. AFAICT:
- "m" toggles whether '^' and '$' match on newline boundaries, and
- "s" toggles whether '.' matches a newline.
I've never thought about such issues except when writing regexes in
'test_coding_rules', a long time ago. I remember that these issues posed
challenges then, but I've never written things like this fluently, and
I've never studied std::regex in detail.
Is it planned to add "m"? See:
https://cplusplus.github.io/LWG/issue2503
| multiline
| Specifies that ^ shall match the beginning of a line and $ shall match
| the end of a line, if the ECMAScript engine is selected.
If so, then is contains_regex2()...
/// Match a regex as with Perl's '-s'.
bool contains_regex2(std::string const& regex)
{return boost::regex_search(text, boost::regex("(?-s)" + regex));}
...the only thing we should consider removing?
> The only way to approximate these modifiers that I found was
> to use basic regex syntax (as opposed to the ECMA script default one), but
> it doesn't seem really useful to compare them as we're almost certainly not
> going to use the basic syntax for anything.
I don't pretend to understand all of this, so let me ask: why wouldn't
we ever use BREs?
Is the "basic" syntax what's tested by contains_regex1(), which today says:
boost::regex const r(regex, boost::regex::sed);
? The comments preceding that function seem to explain why it's not
necessarily a good idea. If that's exactly your point, then it would
be helpful, for me personally, to retain that unit test, because I
probably never will memorize the rationale--the test's purpose then is
not to test std::regex's correctness, but rather to explore alternatives
and document why some are not used.
> OTOH it could be useful to
> compare std::regex performance with PCRE2, but this would require adding
> tests for LMI_HAS_PCRE to this test as well and I don't know if it's really
> worth it.
If that's not difficult, I think it would be most illuminating, and
needing to add LMI_HAS_PCRE doesn't seem like a significant disadvantage.
> My personal preference would actually be to just remove this test
> completely because I don't think it's very useful, but I guess you
> would disagree with this
I strongly prefer to keep as much of it as possible, discarding only
any boost-specific features that aren't in std::regex and aren't
proposed to be added to std::regex.
> But if we keep it, should we use std::regex in the "unit test" part of it,
> or PCRE?
Why not both?
> And what should we be comparing in the "benchmark" part of the
> test?
Why not compare everything we use, and every available thing that
we chose not to use?
> Please let me know if you have any preference or if you think we can just
> leave this test be with the minimum changes required to make it compile
> with std::regex instead of boost::regex and maybe improve (or remove) it
> later.
Minimal changes for now: okay.