[lmi] Unit tests hygiene

lmi
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lmi] Unit tests hygiene

From:	Vadim Zeitlin
Subject:	[lmi] Unit tests hygiene
Date:	Sat, 21 May 2022 20:34:37 +0200
 Hello,

 Some time ago I proposed to run lmi unit tests under UBSAN, the undefined
behaviour sanitizer, and I've finally tried doing this. I used the
autotools build, as it was simple to do there because it's enough to just
append CXX='g++ -fsanitize=undefined' to configure command line and then
build as usual.

<!-- Please feel free to skip this paragraph in first reading -->
<details>

 This _almost_ worked out of the box, but not quite: loads and
mortality_rates tests refused to link with errors complaining about
undefined references to BasicValues typeinfo. This seems to be quite
logical, in fact, because these tests indeed don't link with or otherwise
include basicalues.cpp but they do need RTTI for BasicValues for some of
UBSAN tests. The file loads_test.cpp currently uses a quite horrible hack
and defines just the member functions of BasicValues that it needs, but I
couldn't generalize this hack for the typeinfo, as this would require
defining the destructor there and this, in turn, would require including
all the headers for the other classes that need to be fully defined at the
point of instantiation of the destructor, which in turn pull in more
dependencies, and so finally I just abandoned the idea and disabled these
two tests.

</details>

 And then running "make check" didn't find _any_ problems. As much as I
respect lmi code quality, I have to admit that I become somewhat suspicious
of this, so I double checked... and found that gcc implementation of UBSAN
doesn't set the error code even if UBSAN finds any problems (unlike clang
one, which exits with 127 error code in this case, resulting in the tests
failing, as would be expected, rather than passing). So, if you ever run
into this too, you either need to compile with -fno-sanitize-recover=all in
addition to -fsanitize=undefined itself, or set UBSAN_OPTIONS environment
variable to contain ""halt_on_error=1", which is what I did.

 After doing this I did find (only!) 2 errors. First one occurs in
input_test:

input_test.cpp:165:5: runtime error: load of value 25000, which is not a valid 
value for type 'oenum_alb_or_anb'
input_test.cpp:167:5: runtime error: load of value 25000, which is not a valid 
value for type 'oenum_alb_or_anb const'

This happens for the part of the code for which we already had had to
disable a clang warning and the fact that UBSAN doesn't like it neither
makes me wonder if the comment saying that "C++ allows that" is actually
correct. In any case, this is not a real problem, of course, but perhaps it
would be better to remove this part of the test to avoid to deal with the
compilation and UB-testing issues arising due to it? Or is it
representative of what the actual code does and so you consider it
important to keep it? FWIW I couldn't find any way to avoid UBSAN warning
about doing this.

 The second error is in irc7702a_test:

any_member.hpp:177:33: runtime error: load of value 2, which is not a valid 
value for type 'bool'

which happens because mec_state::C3_init_is_mc boolean field is
uninitialized when we "use" it value_cast() to which it is passed by
_value_ when we call it from that line (which is in holder::assign()) of
any_member.hpp. So this isn't a real problem neither, of course, because we
never actually use this value and an optimizing compiler wouldn't even
bother reading the value before calling value_cast(), but it does look like
an uninitialized variable access to UBSAN. I see a couple of potentially
worthwhile ways to fix it:

- Either we can actually initialize all the fields of mec_state by adding
  "{}" to their declarations.
- Or we could change "To" in value_cast() to "To const&" to ensure that the
  value never needs to be accessed.

I've checked that both solutions work, i.e. they both prevent the error
from happening.


 To summarize, this was all rather underwhelming as UBSAN didn't find any
real problems, which is pretty impressive, as I've never seen it fail to
find anything when used for the first time with any non-trivial project. Of
course, it's still possible that there are occurrences of UB in lmi not
covered by the unit tests, but knowing that all of the code covered by the
existing tests doesn't have any UB in it is already congratulations worthy
on its own -- so please accept them!


 However I didn't want to stop here and, while I was sanitizing lmi,
decided to also check the address sanitizer (ASAN). And here I was slightly
luckier as it did find a real memory-safety problem: the code in
fill_interval_gaps() in input_sequence.cpp can, in theory, dereference a
dangling "last" reference because it refers to the last element of the
"out" vector which may be reallocated after calling push_back() on it. This
can be fixed in several different ways, but here is my proposal:

---------------------------------- >8 --------------------------------------
diff --git a/input_sequence.cpp b/input_sequence.cpp
index 3f573baed..722c74fca 100644
--- a/input_sequence.cpp
+++ b/input_sequence.cpp
@@ -553,11 +553,12 @@ void fill_interval_gaps
             auto const& last = out.back(); // Safe: 'out' cannot be empty.
             if(last.end_duration != next.begin_duration)
                 {
-                out.push_back(default_interval);
-                out.back().begin_mode     = last.end_mode    ;
-                out.back().begin_duration = last.end_duration;
-                out.back().end_mode     = next.begin_mode    ;
-                out.back().end_duration = next.begin_duration;
+                ValueInterval extra_interval{default_interval};
+                extra_interval.begin_mode     = last.end_mode    ;
+                extra_interval.begin_duration = last.end_duration;
+                extra_interval.end_mode     = next.begin_mode    ;
+                extra_interval.end_duration = next.begin_duration;
+                out.push_back(extra_interval);
                 }
             out.push_back(next);
             }
---------------------------------- >8 --------------------------------------

Please let me know if you'd like to do something else, commit this yourself
or let me do it.

 Except for this, ASAN only found a memory leak in input_test, but it is
due to a problem in xmlwrapp that I've fixed there and will include in the
patch updating wxWidgets and wxPdfDoc to ensure that all the changes get
tested at once.


 To really summarize now, by doing my best (worst?) I was able to find just
a single problem in lmi using ASAN and no real problems at all using UBSAN
which is, again, great news and is, IMO, indicative of high code quality.
I think it would be worth enabling various sanitizers for the CI builds,
especially if the problems preventing them from being used out of the box
can be fixed. For ASAN this would involve just fixing the problematic code
above and updating xmlwrapp (or disabling memory leak detection for now).
For UBSAN it's a bit trickier, as I expect that properly using BasicValues
from the affected tests is not that simple, or otherwise it would have been
already done, but should hopefully still be possible -- and if not, I could
always disable building these 2 tests in the CI build using UBSAN.

 Please let me know if you have any comments on or questions about this,
VZ
pgp3KDcnqrJVM.pgp
Description: PGP signature
[Prev in Thread]
Current Thread
[Next in Thread]
[lmi] Unit tests hygiene, Vadim Zeitlin <=
Prev by Date: [lmi] PATCH: fix math_functions_test build with clang
Next by Date: [lmi] Using clang-tidy for detecting reserved identifiers and more (was: [lmi-commits] master 15b60953 7/8: Avoid reserved names and anything resembling them)
Previous by thread: [lmi] PATCH: fix math_functions_test build with clang
Next by thread: [lmi] xanadu/remove-ISOC99_SOURCE-def
Index(es):
- Date
- Thread