[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lmi] Replacing the check for Latin-1 with check for UTF-8
From: |
Vadim Zeitlin |
Subject: |
[lmi] Replacing the check for Latin-1 with check for UTF-8 |
Date: |
Mon, 31 May 2021 20:41:37 +0200 |
Hello,
Rewriting the test in assay_non_latin() in test_coding_rules.cpp using
std::regex without any further changes doesn't work because char is signed
by default making the range \x7f-\x9f used in the regex used there invalid,
as it's specified in the wrong direction (\x7f is 127, but 0x9f is -97).
I'm not immediately sure whether we can use basic_regex<unsigned char> or
maybe basic_regex<char8_t> and while I'm sure some solution to this could
be found, I wonder if we could just remove this function entirely and,
perhaps, replace it by another one, not using regular expressions, that
verifies that a file is properly UTF-8 encoded (which includes all files
containing ASCII characters only, i.e. all the current sources).
Could we do this before replacing Boost.Regex?
VZ
pgpCSow5t1au5.pgp
Description: PGP signature
- [lmi] Replacing the check for Latin-1 with check for UTF-8,
Vadim Zeitlin <=