[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] numeric_io_traits problems, with a patch
From: |
Greg Chicares |
Subject: |
Re: [lmi] numeric_io_traits problems, with a patch |
Date: |
Mon, 26 Feb 2018 19:14:09 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 |
On 2018-02-24 23:26, Vadim Zeitlin wrote:
>
> While testing compilation with g++ 7 I got the following error:
[...]
> numeric_io_traits.hpp: In function �std::__cxx11::string
> simplify_floating_point(const string&)�:
> numeric_io_traits.hpp:98:19: error: this statement may fall through
> [-Werror=implicit-fallthrough=]
> case '0': if(++ri != s.rend()) goto loop;
> ^~
> numeric_io_traits.hpp:99:9: note: here
> case '.': ++ri;
> ^~~~
[...]
> The warning could be corrected by adding [[fallthrough]] for g++7 or
> __attribute__((fallthrough)) for the previous gcc versions (or just use the
> latter for all of them),
Of course we'll either rewrite it, because it's inscrutable, or find
a way to make it scrutable; but I'd like to start by ascertaining
whether it's actually incorrect.
/// Precondition: 's' is a floating-point number formatted as if by
/// std::snprintf() with format "%#.*f" or "%#.*Lf".
[Therefore, 's' must contain a decimal point with at least one digit
to its left--so it matches vim regexp /-\=\d\+\.\d*/ (except that
"nan", "inf", and "-inf" may occur instead).]
std::string::const_reverse_iterator ri = s.rbegin();
loop:
switch(*ri)
{
case '0': if(++ri != s.rend()) goto loop;
case '.': ++ri;
default : ;
}
return std::string(s.begin(), ri.base());
First of all, the '0' case is too complicated: its if() clause can
never fail (if the precondition holds). All the unit tests still
pass with that change:
loop:
switch(*ri)
{
case '0': ++ri; goto loop;
case '.': ++ri;
default : ;
}
So it's a simple state machine:
enter in state A; leave only in state C; possible transitions:
A --> B --> C
A --> C
state A (removing trailing zeros)
if there's a trailing zero, remove it; resume state A
if trailing character is '.', enter state B
otherwise, enter state C
state B (all trailing zeros removed, and '.' is rightmost)
remove trailing '.'; enter state C
state C (all done)
If the precondition holds, that seems to do the right thing
(for NaN and infinities, too), otherwise, AFAICT, it returns
a syntactically valid (but semantically silly) C++ string
no longer than the function's input and no shorter than the
empty string.
> but the code actually seems wrong to me because
> when 0 is the first character of the string and "ri" is already equal to
> "s.rend()" after the first increment, we still increment it again because
> of falling through, and the iterator becomes invalid.
Your test case is this AIUI:
// '0' is the first character encountered by the reverse iterator,
// and incrementing rbegin() once makes it equal to rend(), so
// this is the unique test case satisfying those conditions:
simplify_floating_point("0");
To test it, I experimentally added this assertion:
+ LMI_ASSERT(s.begin() <= ri.base());
return std::string(s.begin(), ri.base());
which does indeed fire for that test case. That's illuminating:
above, I had speculated that this change:
- case '0': if(++ri != s.rend()) goto loop;
+ case '0': ++ri; goto loop;
merely removed a harmless pleonasm, but now we see that it
removes a demonstrable defect (in case the [unasserted]
precondition fails).
> Somehow it still "works" with gcc, even when using -D_GLIBCXX_DEBUG or
> -fsanitize=address, but in clang build with -fsanitize=address it does
> throw an exception due to the use of invalid iterator from std::string ctor
> (which is still unexpected, i.e. I don't understand why doesn't this happen
> without address sanitizer, but I didn't spend time on this).
With my remove-the-if change above, does it pass with all those
diagnostic facilities enabled, even with the extra unit tests
below added?
+ BOOST_TEST_EQUAL( "0", simplify_floating_point( "0.0"));
+ BOOST_TEST_EQUAL( "-0", simplify_floating_point( "-0.0"));
+ // THE FOLLOWING TESTS SHOULDN'T *PASS*; THEY JUST SHOULDN'T SEGFAULT
+ // Tests such as these violate the unasserted precondition that
+ // at least one digit precede the decimal point.
+ BOOST_TEST_EQUAL( "0", simplify_floating_point( ".0"));
+ BOOST_TEST_EQUAL( "-0", simplify_floating_point( "-.0"));
+ // Try harder to provoke an error:
+ BOOST_TEST_EQUAL( "0", simplify_floating_point( "0"));
+ BOOST_TEST_EQUAL( "0", simplify_floating_point( ""));
> So the real fix would seem to be to add a "break" statement here, but I'm
Or remove the 'if' as above, and document how the FSM works.
> really tempted to rewrite this code to avoid the very confusing, IMO, use
> of goto with switch, so I've created https://github.com/vadz/lmi/pull/70
> instead. If you disagree with this PR, please let me know and I'll add a
> "break" instead (or a fallthrough attribute if I'm wrong in my analysis
> above).
I'm not enthusiastic about PR 70. It seems to use the same logic,
and it just has different barriers to comprehension:
- original: the switch-and-goto construct
- PR 70: '++ri' in loop-control-line and also in if-statement,
and a goto in the guise of 'break'
I tried to find a way to write this with std::string functions,
but it looked unnatural every way I could think of. (I wanted
to call something like
find_last_of(find('.'), end(), ".0")
but string::find() returns an offset, not an iterator; writing
it with std::find() instead, and treating the string as a normal
container, just seemed too weird.)
Then this idea occurred to me:
#include "miscellany.hpp" // rtrim()
inline std::string simplify_floating_point(std::string const& s)
{
std::string z(s);
rtrim(z, "0");
rtrim(z, ".");
return z;
}
If we want to rewrite it so that it's simple, transparent, and
obviously correct, then I think that's the best way (noting that
its correctness depends on rtrim() being correct). But, now
that I understand the FSM above, the thought of calling rtrim()
makes me feel unclean. We're just doing a
REPNZ SCASB
with DF pointing backwards, and then backing up over the period,
and we should be able to write that in C, within L1 cache,
without a bunch of object copies and memory reallocations.
Is there a canonically beautiful way to write FSMs in C or C++,
without unnatural-looking control constructs or gigantic
third-party libraries?
> Finally, to explain the use of plural in the title, there is another weird
> problem in this code: I've also tried using valgrind while testing the bug
> above and my fix for it and, somehow, using valgrind results in
>
> ???? test failed: '15' == '16'
> [file numeric_io_test.cpp, line 148]
>
> when running this test, while it passes without it. I guess this must be
> due to a bug in valgrind itself because I don't see why would this function
> call give a different result when running under it otherwise, but it's
> going to be difficult to fix it there, so I'd like to find some way of
> skipping this test, either only when using valgrind or completely under
> Linux. What do you think?
As soon as I saw that diagnostic, I remembered:
// The following test failed for como with mingw (although with a
// value of 0.45036 it unsurprisingly succeeded). It was observed
// to fail also with x86_64-linux-gnu, but only because of a
// mistake that was found before committing, i.e., using log10()
// instead of std::log10() in the implementation caused C function
// log10(double) to be called instead of log10l().
BOOST_TEST_EQUAL(15, floating_point_decimals(0.4503599627370497));
It failed for como; now we turn a blind eye to that because como's
compiler is no more.
Then it failed for x86_64-linux-gnu, and we tracked down and fixed
the actual problem, as documented above.
Now it's failing again. I wouldn't want to
if(some_compiler) goto turn_a_blind_eye_yet_again;
BOOST_TEST_EQUAL(15, floating_point_decimals(0.4503599627370497));
turn_a_blind_eye_yet_again:
without trying really hard to get to the root of the problem.
Might valgrind have its own less-accurate log10l()?