lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Replacing the check for Latin-1 with check for UTF-8


From: Vadim Zeitlin
Subject: [lmi] Replacing the check for Latin-1 with check for UTF-8
Date: Mon, 31 May 2021 20:41:37 +0200

 Hello,

 Rewriting the test in assay_non_latin() in test_coding_rules.cpp using
std::regex without any further changes doesn't work because char is signed
by default making the range \x7f-\x9f used in the regex used there invalid,
as it's specified in the wrong direction (\x7f is 127, but 0x9f is -97).
I'm not immediately sure whether we can use basic_regex<unsigned char> or
maybe basic_regex<char8_t> and while I'm sure some solution to this could
be found, I wonder if we could just remove this function entirely and,
perhaps, replace it by another one, not using regular expressions, that
verifies that a file is properly UTF-8 encoded (which includes all files
containing ASCII characters only, i.e. all the current sources).

 Could we do this before replacing Boost.Regex?
VZ

Attachment: pgpCSow5t1au5.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]