lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Stream cast problems with trailing spaces and more


From: Vadim Zeitlin
Subject: [lmi] Stream cast problems with trailing spaces and more
Date: Fri, 16 Nov 2018 02:00:48 +0100

 Hello,

 I've been looking at stream_cast() code because it started throwing
"Trailing whitespace remains" exception with the latest MSVS version
(15.9.0) when initializing Input::ListBillDate field from its JDN given in
the string form in this class ctor, which means that no illustrations can
be created when using lmi built this compiler, which is, as you can
imagine, somewhat problematic. Leaving aside the question of why does this
member need to be initialized from a string in the first place, the sudden
appearance of this problem is due to a bug fix in the latest version of the
MSVS standard library implementation which has started to (correctly, as it
turned out) set failbit when calling std::ws() if the stream was already at
EOF. Libstdc++ doesn't do it yet, however this was accepted as a bug (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88034) and so will hopefully
change in the future. And, FWIW, clang libc++ behaves correctly, i.e. as
latest MSVS.

 Beyond the problem with MSVS (and clang with libc++ if we ever need to use
it, which is not currently the case as I only use clang with libstdc++ so
far), I find the code in stream_cast.hpp very suspicious. If you look at
the part inside "#if !defined LMI_COMO_WITH_MINGW" guard, it checks for
"!(interpreter >> std::ws)", i.e. if failbit is set on the stream after
calling ws() on it. According to the standard[*], this should only happen
if the stream is already at EOF. But this should be the case only in the
"successful" case, i.e. when all the input was consumed by "interpreter >>
result" above, while the code handles this as an error for some reason.

 So, AFAICS, this test is simply reversed. It so happens that, due to
libstdc++ bug above, this doesn't trigger any errors when converting valid
inputs currently, as libstdc++ (and previous versions of MSVS, like the one
I still use) never sets failbit anyhow. However it definitely doesn't give
the expected error if there is any trailing whitespace in the input
neither. Instead, it only gives the "Unconverted data remains" error when
non-space (see below) trailing whitespace is present. This can be easily
seen by applying this patch:

---------------------------------- >8 --------------------------------------
diff --git a/stream_cast_test.cpp b/stream_cast_test.cpp
index 2e832c9a9..2a79bfacd 100644
--- a/stream_cast_test.cpp
+++ b/stream_cast_test.cpp
@@ -23,6 +23,8 @@

 #include "stream_cast.hpp"

+#include "calendar_date.hpp"
+
 #include "test_tools.hpp"

 int test_main(int, char*[])
@@ -98,6 +100,12 @@ int test_main(int, char*[])
         ,"Cannot convert (char const*)(0) to std::string."
         );

+    BOOST_TEST_THROW
+        (stream_cast<calendar_date>("2454687\n")
+        ,std::runtime_error
+        ,lmi_test::what_regex{"^Trailing whitespace remains .+"}
+        );
+
     return 0;
 }

---------------------------------- >8 --------------------------------------
(and also adding calendar_date.cpp and null_stream.cpp to this test list of
files in the makefile) and running the test, which will result in the
following output:

    ???? test failed: Expression 'stream_cast<calendar_date>("2454687\n")' 
failed to throw expected exception 'std::runtime_error'
    [file stream_cast_test.cpp, line 107]


 Note that the matters are further confused by the use of the special
blank_is_not_whitespace_locale() for the interpreter stream. So everything
I wrote above was correct for non-space white space, but real trailing
spaces, wouldn't be consumed by ws() because they're explicitly not
considered to be whitespace here. So if "2454687 " is used above instead of
"2454687\n", the test still fails, but it does it with a different message:

    ???? test failed: Caught exception
        'Unconverted data remains converting '2454687 ' from type 'char const*' 
to type 'calendar_date'.'
      when
        '^Trailing whitespace remains .+'
      was expected.
    [file stream_cast_test.cpp, line 107]

because std::ws doesn't consume anything in this case.


 I think the above convincingly shows that there is something wrong with
this code. But before suggesting how to fix it, I'd like to ascertain the
intention of this function. It seems to want to handle trailing spaces as
an error when they're not significant, while at the same time preserving
spaces, presumably including trailing ones, when converting between
strings. Is this correct?

 If so, I'd suggest splitting the string and non-string implementations, as
they clearly [try to] do incompatible things. In fact, it looks like the
primary template is already not used for strings due to the existence of a
specialization for std::string at the end of stream_cast.hpp header. So I
think that the imbue() call in the primary template should be simply
removed (at the very least, removing it doesn't break the existing tests,
which is a good sign). As an aside, it seems very suspicious to want to
preserve space but not, for example, tabs, to me anyhow. The rationale
explaining that "spaces are more common" given in the comment above this
function doesn't convince me at all. On the contrary, IMO if a bug (and
losing the part of the string after a tab would be a bug) happens more
rarely, it's worse, and not better, than if it happened more frequently.

 Also, is detecting trailing whitespace separately from arbitrary trailing
characters really necessary? Personally I don't think so, so I would just
remove the "interpreter >> std::ws" clause completely and content myself
with the "Unconverted data remains" error. But if we really need to do it,
then we should do it in a different, and actually working, way. This is
somewhat complicated by the fact that the simplest and standard-conforming
implementation, simply reversing the existing check direction, wouldn't
work with the current g++ versions. However we could use something like
this, which should work independently of the presence or absence of this
libstdc++ bug:
---------------------------------- >8 --------------------------------------
diff --git a/stream_cast.hpp b/stream_cast.hpp
index aaca916bb..5ee9174ed 100644
--- a/stream_cast.hpp
+++ b/stream_cast.hpp
@@ -118,14 +118,18 @@ To stream_cast(From from, To = To())
         err << "Output failed ";
         }
 #if !defined LMI_COMO_WITH_MINGW
-    // COMPILER !! This appears to be a defect in libcomo.
-    else if(!(interpreter >> std::ws))
-        {
-        err << "Trailing whitespace remains ";
-        }
     else if(!interpreter.eof())
         {
-        err << "Unconverted data remains ";
+        interpreter >> std::ws;
+        if(interpreter.eof())
+            {
+            err << "Trailing whitespace remains ";
+            }
+        else
+            {
+            err << "Unconverted data remains ";
+            }
+        interpreter.setstate(std::ios_base::failbit);
         }
 #endif // !defined LMI_COMO_WITH_MINGW
     else
---------------------------------- >8 --------------------------------------
In any case, we also need to add a unit test for this particular error,
which is currently missing, i.e. at least the first patch above and maybe
something more extensive.

 Finally, while I'm looking at this code, I can't help wondering why do we
have a dummy "To" parameter in this function template? Is it some
workaround for ancient and not supported any more compiler(s)? Because it
doesn't seem to be needed any longer.


 To summarize, I'm rather confused by this code, so I could well be missing
something here, but I'm almost sure that it is incorrect. As always, I'd
like to propose to fix it myself (i.e. at least make a proper patch with
all the changes above), but please let me know if you disagree with my
analysis above or if you just prefer to do it yourself.

 Thanks in advance,
VZ

[*] Please see Billy O'Neal's detailed explanation in the gcc bugzilla
    entry linked above for the full details, but, in brief, this is due to
    [istream.manip] saying that ws() must behave as an unformatted input
    function and [istream.unformatted] explicitly stating that this means
    constructing a sentry object and, finally, [istream.sentry]/2
    saying that it should set failbit if the stream state is not good,
    which includes being at EOF.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]