[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lmi] Problems with the file path validation test using NULs
From: |
Vadim Zeitlin |
Subject: |
[lmi] Problems with the file path validation test using NULs |
Date: |
Wed, 12 May 2021 20:51:53 +0200 |
Hello,
The really last problem I see in the CI builds is this one, arising in
clang-11 build:
???? test failed: Caught exception
'Unit test file ''
when
'Unit test file '' not found.'
was expected.
[file ../path_utility_test.cpp, line 426]
The error message is a bit misleading, because it might look like the
"not found." part is missing when actually -- notice the subtle difference
in the beginning -- it's the "' not found." which is missing, i.e.
everything after the first embedded NUL, because the file name consists of
2 NULs here.
Initially I thought it was just another problem in clang 11 libc++,
because the test does pass with gcc (all versions) and clang 12. But after
looking at this, I now think there is both a problem in clang 11 and in the
test itself and maybe even lmi alert functions -- in addition to the
possible problem in fs::path itself.
If you have time for reading more about this and discussing it now, please
keep reading. Otherwise, please skip until "END DIVERSION" part below.
[BEGIN DIVERSION]
If you're still here, to see what I'm speaking about, consider this patch:
---------------------------------- >8 --------------------------------------
diff --git a/path_utility_test.cpp b/path_utility_test.cpp
index 4f4adc675..c73aa1eae 100644
--- a/path_utility_test.cpp
+++ b/path_utility_test.cpp
@@ -418,7 +418,7 @@ void test_path_validation()
// next two tests are senseless.
// Neither posix nor msw allows NUL in paths.
- std::string nulls = {'\0', '\0'};
+ std::string nulls = {'a', '\0', 'b', '\0', 'c'};
LMI_TEST_THROW
(validate_filepath(nulls, context)
,std::runtime_error
---------------------------------- >8 --------------------------------------
This will make the test fail, of course, because it still expects "''" in
the error message, but it shows that the actual error message with gcc or
clang-12 is "Unit test file 'a' not found." (disregarding the location
part), not "'abc' not found" as might be expected. I.e. we lose everything
after the first NUL with these standard libraries implementations at some
point.
Of course, if I had really thought about it, instead of just "expecting"
to see "abc", I could have realized immediately that because
std::runtime_error::what() returns a "const char*", and not "std::string",
there is no hope to recover anything after the first NUL, and so I should
never have expected this to work -- there is just no way to "see" anything
beyond it when dealing with "char*" strings that are, by definition,
terminated by it.
I mostly blame myself, but I think the test merits a small part of the
blame too because it could have a comment explaining that even though the
original file path is not empty, the string in the exception message must
be empty because it is constructed using std::runtime_error::what() in
test_tools.hpp. It could also use the "nulls" string consisting of not only
nulls above to make it more obvious that the check really relies on it
being truncated too.
Finally, perhaps there is a possible improvement here and alarum_alert()
(and other functions in the same file?) should replace non-printable
characters, including but not limited to NUL, with their printable
equivalents. This could be pretty useful for diagnostic purposes, I think,
as it's quite annoying to not see the correct data in the log output.
[END DIVERSION]
But all this, whether you've read it or not, doesn't explain the
difference between clang-11 and the other cases. What does explain it is
just this innocuous comment in path.hpp:
// Also note that the input string can't contain embedded NULs here, as
// they're not allowed in file paths, hence there is no need to use
// size.
which is, of course, correct on its own, but truncates the file path string
_before_ constructing std::runtime_error object when using a compiler with
char8_t support (because this code is inside a check for _cpp_char8_t),
ensuring that we get the rest of the string after the file name because
there is now no NUL in the file name to terminate it.
Of course, fixing this bug, by doing
---------------------------------- >8 --------------------------------------
diff --git a/path.hpp b/path.hpp
index 71f334d75..38177f22e 100644
--- a/path.hpp
+++ b/path.hpp
@@ -355,11 +355,7 @@ class path
// of char pointers that can be used to iterate over any buffer and
// that u8string contents can always be stored in string (unlike vice
// versa).
- //
- // Also note that the input string can't contain embedded NULs here, as
- // they're not allowed in file paths, hence there is no need to use
- // size.
- return reinterpret_cast<char const*>(s8.c_str());
+ return std::string{reinterpret_cast<char const*>(s8.c_str()),
s8.size()};
}
#endif // defined __cpp_char8_t
---------------------------------- >8 --------------------------------------
does not fix the problem for clang-11 but rather breaks the test for all
the other compilers too! And this is perfectly normal because we now
preserve NULs in the runtime_error string, which means that it will get
truncated on the first one of them when we examine this string using what()
and so the "observed" string will be always just "Unit test file'".
We could fix the test by also truncating the string at the first null by
modifying the other u8string_as_string() implementation, used for the
compilers without char8_t support, and by modifying the test to just check
for "Unit test file'". But if we're going to modify the test anyhow -- and
I don't see how could we avoid doing it, in at least some way -- what about
---------------------------------- >8 --------------------------------------
diff --git a/path_utility_test.cpp b/path_utility_test.cpp
index 4f4adc675..acfedcfa8 100644
--- a/path_utility_test.cpp
+++ b/path_utility_test.cpp
@@ -418,11 +418,17 @@ void test_path_validation()
// next two tests are senseless.
// Neither posix nor msw allows NUL in paths.
- std::string nulls = {'\0', '\0'};
+ std::string with_nulls = {'x', '\0', 'y', '\0', 'z'};
+
+ // Note that we can't test for the full error message here because it is
+ // truncated at the first NUL, due to using std::runtime_error::what(),
+ // which returns "char*" string terminated by the first NUL occurring in
+ // it, when constructing this message, so just check that it starts with
+ // the expected part.
LMI_TEST_THROW
- (validate_filepath(nulls, context)
+ (validate_filepath(with_nulls, context)
,std::runtime_error
- ,"Unit test file '' not found."
+ ,lmi_test::what_regex("^Unit test file 'x")
);
// Posix doesn't forbid these characters, though msw does.
---------------------------------- >8 --------------------------------------
This is enough to fix the test for clang-11 and the to path.hpp is not
really needed in this case, although we might still want to apply it for
consistency. And the addition of 'x', 'y' and 'z' to the string and its
subsequent renaming is not needed neither, but IMHO makes things more
clear.
All of these changes are included in my (not very appropriately named any
more, as it includes other changes too) "ci-cxx20" branch on GitHub, that
you can examine, but I'll make a separate post about it, just to avoid
making this one even longer than it already is. Here I'll just say that
with the changes there all the CI builds finally do pass, see
https://github.com/let-me-illustrate/lmi/actions/runs/836418052
(pleae don't look at the ".." URL containing dozens of the attempts that
had previously failed).
Sorry for so much ado about such a not very important question, I almost
wonder if it wouldn't be better to just remove this test using NULs (and
also the "BOOST !!" comment before it) instead. I'd still commit the change
to path.hpp even in this case however.
Thanks in advance for looking at this,
VZ
pgpJ_EnJYmlrJ.pgp
Description: PGP signature
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [lmi] Problems with the file path validation test using NULs,
Vadim Zeitlin <=