[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Conversion from class html::text
From: |
Vadim Zeitlin |
Subject: |
Re: [lmi] Conversion from class html::text |
Date: |
Sun, 4 Feb 2018 14:47:00 +0100 |
On Sun, 4 Feb 2018 01:52:01 +0000 Greg Chicares <address@hidden> wrote:
GC> I'm still not fluent in this aspect of the modern language, so I need to
GC> keep studying it. I've read these papers:
GC> A Brief Introduction to Rvalue References
GC> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2027.html
GC> Extending move semantics to *this
GC> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2439.htm
GC> again, along with the relevant parts of the latest TC++PL and some of
GC> Meyers's writings. It's easy to find introductory materials like those,
GC> but I can't find much good non-introductory documentation.
Unfortunately I don't know where to find good explanations of these things
neither. I typically learn about new C++ features from some blog posts and
then use cppreference.com as the reference documentation to fill in the
details, e.g. http://en.cppreference.com/w/cpp/language/reference in this
case. As for the blog posts, I don't remember the URLs I read in ~2010 any
longer, of course, but looking right now I seem to remember reading
http://thbecker.net/articles/rvalue_references/section_01.html and finding
it useful and well-written.
Reading the working papers can, of course, be informative, but the problem
is that they change a lot and I'm pretty sure things must have changed
since 2006, although possibly only subtly (which is, of course, even worse).
Also, even terminology has changed, and what was used to be called
"universal references" is now called "forwarding references", for example.
GC> If I understand this correctly, it's not unsafe because
GC> std::string&& as_html() && // Must not be called more than once!
GC> cannot be called more than once: the ref-qualifier '&&' prevents that.
Yes, exactly, the comment should have explained it better.
To be precise, it doesn't really prevent this, unfortunately in C++ you
can perfectly well do something like this:
html::text h{...};
std::string s = std::move(h).as_html();
s += std::move(h).as_html();
i.e. move from "h" twice (this is, BTW, the fundamental difference between
C++ and Rust, where the semantics are the same, but the above is a
compile-time error). But this is something that has to be done explicitly
and should normally never happen because a variable that was moved from
must simply not be used ever again. And, at least somewhat reassuringly, if
it does happen, it's not catastrophic as the object that was moved from
remains in a valid state and, typically, this state would be empty, so the
second function call would return an empty string. Of course, you can't
rely on it (the standard guarantees that the state is valid, but leaves it
unspecified otherwise), so the first rule of never using the variable after
moving must still be respected. But it does provide some safety net.
GC> For example, here:
GC> bool test_variable(std::string const& name) const
GC> {
GC> auto const z = expand_html(name).as_html();
GC> it's not possible to call as_html() && more than once on the temporary:
GC> if it weren't a temporary, the ref-qualifier would make that illegal; and
GC> because it's a temporary, it doesn't have a name and we can't write a
GC> subsequent expression that would use it. Is that reasoning valid?
Yes, in this particular case it is. But you could reduce it to the case
above by doing
auto&& tmp = expand_html(name);
auto const z = std::move(tmp).as_html();
auto const z2 = std::move(tmp).as_html();
GC> Now I'm thinking that it's only the comment (and not the '&&'-qualified
GC> function itself) that caused me discomfort. If my reasoning above is
GC> correct, then no cautionary "don't do this" comment is needed:
GC>
GC> /// As it still needs to be converted to a string sooner or later to be
really
GC> -/// used, it does provide a conversion -- but it can be used only once.
GC> +/// used, it does provide a conversion, as_html().
I think this variant of the comment is, perhaps, more correct, but it's
definitely less complete because it doesn't indicate in any way that
as_html() can't be used more than once. So perhaps it should say
/// used, it does provide a conversion method as_html() which must
/// be the last method called on an instance of this class, and
/// requires an rvalue reference to the object.
I'm not sure if it's really clear, but I do feel like it needs to be said
that as_html() is not just the usual accessor.
GC> > Note that having r-value-ref-qualified overload is still useful because
it
GC> > avoids copying (potentially not very small) strings in the most common use
GC> > case. And, in fact, I wonder if it might make sense to change
GC> > pdf_writer_wx::output_html() to take html::text by r-value reference as we
GC> > only ever pass temporary objects to it anyhow and then remove the "const&"
GC> > overload, restoring the comment validity. I.e. apply this (not really
GC> > tested yet) patch:
GC>
GC> In that patch, removing the "const&" overload requires only changing the
GC> type of one argument to pdf_writer_wx::output_html():
GC>
GC> - ,html::text const& html
GC> + ,html::text&& html
GC>
GC> and consequently adding std::move() here:
GC>
GC> - auto const html_str = wxString::FromUTF8(html.as_html());
GC> + auto const html_str = wxString::FromUTF8(std::move(html).as_html());
GC>
GC> so it's a simplification, and I see nothing wrong with that, unless...
GC>
GC> > What do you think, is it worth using a slightly unusual signature for
GC> > output_html() to restore the "convertible to string only once" property of
GC> > html::text and, incidentally, to avoid a bunch of string copies when
GC> > calling it?
GC>
GC> I still balk at calling it "convertible only once"; I'd rather call
GC> this the "usable only with temporaries" property (once again crucially
GC> assuming that my reasoning about that above is correct).
It is correct, but not completely general. Temporaries are one important
class of rvalue references, but not the only one.
GC> I'd say the big advantage is eliminating the as_html() const& overload
GC> for simplicity. Avoiding a string copy in this one isolated instance
GC> might have no noticeable effect, but we're already avoiding them in
GC> many other places where the cumulative savings could be noticeable.
GC>
GC> And now we come to the "unusual signature" question, which I can't
GC> really answer because I don't know what's idiomatic--but does anyone?
GC> In 1986, the signature now in HEAD looked weird:
GC>
GC> int pdf_writer_wx::output_html
GC> (int x
GC> ,int y
GC> ,int width
GC> ,html::text const& html
GC> ,enum_output_mode output_mode
GC> )
Hmm, does it? Other than the relative location of const with respect to he
type, this is exactly how I would have written it in 1991 when I started
writing C++. Had it really changed that much since 1986 by then?
GC> Does the change suggested:
GC>
GC> - ,html::text const& html
GC> + ,html::text&& html
GC>
GC> look really odd today?
I'm not sure. It is definitely less usual than the other one because the
vast majority of the existing C++ code uses C++98 and so is not written
this way.
GC> Why?
Even from C++11 point of view, it doesn't necessarily look logical that an
output function consumes the thing it outputs. I do think it can be
justified on the grounds that html::text is a utility class which only
exists in order to be output into PDF and thus it's a good thing that we
enforce that nothing can (well, should) be done with objects of this class
after they were output because this would be useless. But it's still
something that you need to think about and, if we do it like this, I'd
definitely add a comment explaining this.
GC> Would it be less odd if this were the first parameter in that list of
GC> five? Or if the other arguments were combined and passed as a struct,
GC> e.g.?
No, my concerns are completely orthogonal to this.
GC> How about this signature for std::forward_list [23.3.4.5]:
GC> iterator insert_after(const_iterator position, T&& x);
This is more natural: you really move the object into the list here. You
don't necessarily think of outputting HTML into PDF as "moving" it there.
GC> > P.S. I'm cc'ing this to you directly, but the list seems to work fine for
GC> > me, i.e. I received the original message via it immediately when it
GC> > was sent (but wasn't there to reply to it) and also received your
GC> > reply in the PDF generation performance thread (both on and off
list).
GC>
GC> OTOH, I don't imagine you got either of the "This is a test"
GC> messages that I sent to this list from two different accounts.
No, I didn't.
GC> For the time being, if you don't mind, I'll CC you, and ask you
GC> to continue to CC me (I received only your CC of the message
GC> replied to here--but no copy with any gnu.org server in the
GC> headers). Meanwhile, I'll monitor the html archives and see what
GC> comes through at another address that I've subscribed to this list.
FWIW I do see all the other (except the test ones) messages in the
archives. Something really strange is going on here. But then this is more
often the case than not with email delivery nowadays, unfortunately.
BTW, I wonder if you received this one, about wxWidgets submodules:
http://lists.nongnu.org/archive/html/lmi/2018-02/msg00006.html
? It's not urgent at all, but I wouldn't want it to pass completely
unnoticed, if only because you're likely to run into problems when trying
to use a newer version of wxWidgets in the future.
Regards,
VZ