[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Stream cast problems with trailing spaces and more
From: |
Greg Chicares |
Subject: |
Re: [lmi] Stream cast problems with trailing spaces and more |
Date: |
Wed, 29 Jan 2020 02:15:53 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 |
[Although I replied just hours ago to a more recent message on the same
topic, this older message contains some intriguing detail that deserves
to be addressed however belatedly.]
On 2018-11-16 01:00, Vadim Zeitlin wrote:
>
> I've been looking at stream_cast() code because it started throwing
> "Trailing whitespace remains" exception with the latest MSVS version
> (15.9.0) when initializing Input::ListBillDate field from its JDN given in
> the string form in this class ctor, which means that no illustrations can
> be created when using lmi built this compiler, which is, as you can
> imagine, somewhat problematic. Leaving aside the question of why does this>
> member need to be initialized from a string in the first place,
Initialization from string for uniformity:
Input::Input()
:IssueAge {"45"}
,RetirementAge {"65"}
,Gender {"Male"}
...
,ListBillDate {"2440588"} // Assume no inforce so old
> the sudden
> appearance of this problem is due to a bug fix in the latest version of the
> MSVS standard library implementation which has started to (correctly, as it
> turned out) set failbit when calling std::ws() if the stream was already at
> EOF. Libstdc++ doesn't do it yet, however this was accepted as a bug (see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88034) and so will hopefully
> change in the future. And, FWIW, clang libc++ behaves correctly, i.e. as
> latest MSVS.
Fascinating. I thought ms used the dinkumware standard library, which
purports to be so rigorously tested that it shouldn't have had such an
error, even decades ago.
> Beyond the problem with MSVS (and clang with libc++ if we ever need to use
> it, which is not currently the case as I only use clang with libstdc++ so
> far),
Probably we should treat clang as favorably as any other free toolchain.
> I find the code in stream_cast.hpp very suspicious. If you look at
> the part inside "#if !defined LMI_COMO_WITH_MINGW" guard,
[old comeau-specific code has been removed]
> it checks for
> "!(interpreter >> std::ws)", i.e. if failbit is set on the stream after
> calling ws() on it. According to the standard[*], this should only happen
> if the stream is already at EOF. But this should be the case only in the
> "successful" case, i.e. when all the input was consumed by "interpreter >>
> result" above, while the code handles this as an error for some reason.
That's where I introduced the defect. The classic one-liner, given in
'stream_cast_test.cpp' as template function streamlined(), is:
if
( !(interpreter << from)
|| !(interpreter >> result)
|| !(interpreter >> std::ws).eof()
)
throw ...
There are at least three conditions there that might cause an
exception to be thrown. I successfully distinguished those on the
'<< from' and '>> result' lines. But I mishandled the last physical
line. I took it to mean
extract std::ws, which might or might not succeed, much as the
two lines preceding it might (so it seemed like a good idea
to report any failure); and then
invoke eof(), which similarly might or might not succeed
but the real meaning is
extract std::ws to ignore trailing whitespace (which is to be
treated as irrelevant), paying no attention to whether or not
any such whitespace was so ignored; and then
verify that we're at EOF
> So, AFAICS, this test is simply reversed. It so happens that, due to
> libstdc++ bug above, this doesn't trigger any errors when converting valid
> inputs currently, as libstdc++ (and previous versions of MSVS, like the one
> I still use) never sets failbit anyhow. However it definitely doesn't give
> the expected error if there is any trailing whitespace in the input
> neither.
That's the root cause of the defect: I coded various exceptions
but failed to test whether they could be thrown. Now I've gone
back and tried to construct tests for all of them, but I couldn't
find a way to elicit the "trailing whitespace" diagnostic,
apparently because, as you point out, it cannot be elicited:
> Instead, it only gives the "Unconverted data remains" error when
> non-space (see below) trailing whitespace is present. This can be easily
> seen by applying this patch:
Thanks.
> Note that the matters are further confused by the use of the special
> blank_is_not_whitespace_locale() for the interpreter stream.
BTW, that locale is used so that natural-language strings like:
#define solve_type_NAMES \
{"No solve" \
,"Specified amount" \
,"Employee premium" \
can be extracted as a unit, regardless of embedded '0x020' ASCII spaces.
(No such strings embed any more exotic types of whitespace.)
> So everything
> I wrote above was correct for non-space white space, but real trailing
> spaces, wouldn't be consumed by ws() because they're explicitly not
> considered to be whitespace here. So if "2454687 " is used above instead of
> "2454687\n", the test still fails, but it does it with a different message:
>
> ???? test failed: Caught exception
> 'Unconverted data remains converting '2454687 ' from type 'char
> const*' to type 'calendar_date'.'
> when
> '^Trailing whitespace remains .+'
> was expected.
> [file stream_cast_test.cpp, line 107]
>
> because std::ws doesn't consume anything in this case.
Indeed.
> I think the above convincingly shows that there is something wrong with
> this code. But before suggesting how to fix it, I'd like to ascertain the
> intention of this function. It seems to want to handle trailing spaces as
> an error when they're not significant, while at the same time preserving
> spaces, presumably including trailing ones, when converting between
> strings. Is this correct?
No. There was no such deep thought behind it. I just took a standard
idiom (streamlined() above) and broke its one-line implementation
into pieces so that I could distinguish errors arising for each piece;
but I did that incorrectly.
And specializations like
template<>
inline std::string stream_cast<std::string> (std::string from ,std::string)
{
return from;
}
are only for efficiency: they aren't intended to vary the behavior
based on argument types.
> If so, I'd suggest splitting the string and non-string implementations, as
> they clearly [try to] do incompatible things. In fact, it looks like the
> primary template is already not used for strings due to the existence of a
> specialization for std::string at the end of stream_cast.hpp header. So I
> think that the imbue() call in the primary template should be simply
> removed (at the very least, removing it doesn't break the existing tests,
> which is a good sign).
It didn't break 'stream_cast_test', but it does after some commit that
I'll push soon.
However, suppressing the imbue() line would always have broken 'input_test':
???? uncaught exception: std::runtime_error: Unconverted data remains
converting 'Allow MEC' from type 'mc_enum<mcenum_mec_avoid_method>' to type
'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
>'.
> As an aside, it seems very suspicious to want to
> preserve space but not, for example, tabs, to me anyhow. The rationale
> explaining that "spaces are more common" given in the comment above this
> function doesn't convince me at all. On the contrary, IMO if a bug (and
> losing the part of the string after a tab would be a bug) happens more
> rarely, it's worse, and not better, than if it happened more frequently.
As-yet uncommitted correction:
/// Blank is the only whitespace character not treated as whitespace,
-/// because blanks are more common than other whitespace characters in
-/// std::strings.
+/// because blanks are deliberately used in strings like "Allow MEC"
+/// that are mapped to enumerators in 'mc_enum_types.xpp', where other
+/// whitespace characters would not be used.
> Also, is detecting trailing whitespace separately from arbitrary trailing
> characters really necessary?
I think so: "Allow MEC" is good, and I suppose "Allow MEC\n" is good if
we're reading a flat-text file line by line, but "Allow MEC " is wrong.
> Personally I don't think so, so I would just
> remove the "interpreter >> std::ws" clause completely and content myself
> with the "Unconverted data remains" error.
Please take a look at how I changed that (once I push the commit).
"interpreter >> std::ws" is a valid part of the customary idiom;
the error was testing stream state with operator!() after it.
There's no need to distinguish these erroneous cases:
"Allow MEC "
"Allow MEC sometimes"
> Finally, while I'm looking at this code, I can't help wondering why do we
> have a dummy "To" parameter in this function template? Is it some
> workaround for ancient and not supported any more compiler(s)? Because it
> doesn't seem to be needed any longer.
Isn't it needed because the template-parameter list and the argument list
have different orders?
template<typename To, typename From>
To stream_cast(From from, To = To())
With
- To stream_cast(From from, To = To())
+ To stream_cast(From from)
I get:
i686-w64-mingw32-g++ ... ... /opt/lmi/src/lmi/datum_string.cpp -odatum_string.o
In file included from /opt/lmi/src/lmi/value_cast.hpp:29,
from /opt/lmi/src/lmi/datum_string.hpp:29,
from /opt/lmi/src/lmi/datum_string.cpp:24:
/opt/lmi/src/lmi/stream_cast.hpp:153:20: error: template-id
‘stream_cast<std::__cxx11::string>’ for
‘std::__cxx11::string stream_cast(char*, std::__cxx11::string)’ does not match
any template declaration
inline std::string stream_cast<std::string>(char* from, std::string)
^~~~~~~~~~~~~~~~~~~~~~~~
/opt/lmi/src/lmi/stream_cast.hpp:99:4: note: candidate is: ‘template<class To,
class From> To stream_cast(From)’
To stream_cast(From from)
^~~~~~~~~~~
- Re: [lmi] Stream cast problems with trailing spaces and more,
Greg Chicares <=
- Re: [lmi] Stream cast problems with trailing spaces and more, Vadim Zeitlin, 2020/01/29
- Re: [lmi] Stream cast problems with trailing spaces and more, Greg Chicares, 2020/01/29
- Re: [lmi] Stream cast problems with trailing spaces and more, Vadim Zeitlin, 2020/01/29
- Re: [lmi] Stream cast problems with trailing spaces and more, Greg Chicares, 2020/01/29
- Re: [lmi] Stream cast problems with trailing spaces and more, Greg Chicares, 2020/01/29
- Re: [lmi] Stream cast problems with trailing spaces and more, Vadim Zeitlin, 2020/01/30
- Re: [lmi] Stream cast problems with trailing spaces and more, Greg Chicares, 2020/01/31
- Re: [lmi] Stream cast problems with trailing spaces and more, Vadim Zeitlin, 2020/01/31