lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Best way to integrate PCRE


From: Vadim Zeitlin
Subject: [lmi] Best way to integrate PCRE
Date: Wed, 28 Jul 2021 16:44:32 +0200

 Hello,

 I've agonized so long over this question that I think, even if I don't
like bothering you with this again, that I need to ask you about what you
think would be the best way to integrate PCRE into lmi build because I just
can't find any solution that I really like.

 If you remember, my original idea was to use PCRE version that would be
built during the build of wxWidgets. Unfortunately, as it often happens, I
found a problem with this plan after starting to actually do it. The
problem is that wxWidgets requires the version of the library compatible
with the string representation it uses, which is, by default, UTF-32 under
Unix and UTF-16 under MSW (which corresponds to the size of wchar_t under
these platforms), meaning that it uses either libpcre2-32 or libpcre2-16.
However lmi uses UTF-8-encoded std::string and so requires a version of the
library using 8-bit characters, i.e. libpcre2-8. All three versions of the
library can be built from the same sources, but are different and
incompatible libraries and we can't use libpcre2-{16,32} used by wx with
narrow char strings used by lmi.


 There are a few ways to solve this but, as I said, each of them has its
own drawbacks:

1. We could build wxWidgets using UTF-8 for the strings, so that it builds
   libpcre2-8 during its build. wxWidgets does support this and it is,
   arguably, the best build variant under Unix where all the other
   libraries, including GTK, use UTF-8 and so using UTF-8 in wx allows to
   avoid transcoding strings between UTF-32 and UTF-8 all the time. However
   under MSW UTF-16 is used internally, so this could slow down lmi there.
   And there could be bugs in this wx build not present in the currently
   used default build, so changing this is risky and I'm not sure you want
   to take this risk.


2. We could use PCRE library built during the default wx build, i.e.
   compatible with wchar_t, in lmi. This would, however, require either
   replacing all strings in test_coding_rules.cpp with wide strings or
   transcoding from UTF-8 to UTF-{16,32} in the wrappers in pcre_regex.hpp
   which would add extra overhead. I'm pretty sure we don't want to do the
   former and, as to the latter, while I didn't measure this overhead, it's
   definitely not going to be 0 and it seems a bit stupid to make this code
   slower than it could be without any real reason.


3. We could build our own 8-bit PCRE using PCRE sources from wx submodule,
   i.e. add some install_pcre.sh similar to the existing scripts with the
   same names. This would work, and is what I'm doing right now, but it
   requires more custom shell scripts, i.e. exactly what I initially hoped
   to avoid.


 Also, if we do (3), I wonder if we want to always build our own version of
PCRE in this case or only when targeting MSW, because in the native Linux
builds we could install libpcre2-8 from the system package and use this one
instead. This is appealing because it's simpler and faster (although
building PCRE doesn't take long, it's C code and so is fast to compile),
but OTOH it could result in different versions of PCRE being used in Linux
and MSW versions. Normally it shouldn't matter much because PCRE has a
pretty good compatibility record but it would still arguably be better to
use the same version with all builds, so currently my install_pcre.sh
always builds it, even under Linux.

 Please let me know if you have any preferences here or other ideas and
I'll finally make the patch replacing Boost.Regex with PCRE once I can be
sure which solution you prefer.

 Thanks in advance!
VZ

Attachment: pgpcAhbmWinzP.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]