lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Working with and around strict aliasing


From: Greg Chicares
Subject: Re: [lmi] Working with and around strict aliasing
Date: Sat, 19 Sep 2015 15:46:05 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0

On 2015-09-19 02:01, Vadim Zeitlin wrote:
> On Fri, 18 Sep 2015 21:34:34 +0000 Greg Chicares <address@hidden> wrote:
[...]
> GC> Yes, please. I'm in a quandary. The original reinterpret_cast lines seem
> GC> much clearer to me than the memcpy() replacements; but if their behavior
> GC> is undefined, then they have to change. I'm hoping that memcpy_cast<>()
> GC> might preserve code clarity while slaying the UB demon.
> 
>  Here is how the code would look like:

[...stripping context to show only the lines that change...]

> -        t = *reinterpret_cast<T*>(z);
> +        t = from_buffer_cast<T>(z);

> -            *reinterpret_cast<boost::int32_t*>(index_record)
> +            from_buffer_cast_as<boost::int32_t, int>(index_record)

> -            boost::int32_t z = *reinterpret_cast<boost::int32_t*>(p);
> +            int z = from_buffer_cast_as<boost::int32_t, int>(p);
> -            table_offset_ = std::streampos(static_cast<int>(z));
> +            table_offset_ = std::streampos(z);

> -        data_[j] = *reinterpret_cast<double*>(z);
> +        data_[j] = from_buffer_cast<double>(z);

Yes!

> I've called the function from_buffer_cast() instead of memcpy_cast()
> because I think the use of memcpy is just an implementation detail and the
> important thing is that this function is meant to be used to reconstruct
> the values from their raw representation in the buffer.

First of all, is there only One True Way to do this? Candidates:
 - Classic C cast or the not-much-better reinterpret_cast. That's
   the problem we're trying to solve, so it can't be the solution.
 - The union trick. But at best that relies on implementation-defined
   behavior. Its advantage over reinterpret_cast is that gcc does
   define that behavior; but gcc's just one compiler.
 - Use memcpy(). AIUI, this is guaranteed to work. See:

https://news.ycombinator.com/item?id=8684561
| Believe it or not, memcpy() is now the only portable and properly defined
| way to cast one type to another in C. You are expected to use it with the
| explicit intent that no bytes actually be copied, simply as a (completely
| voluntary) offering to the type gods. If they accept your offering, the
| call will just disappear from your code, and never be made.
[i.e., the compiler doesn't call into the std lib--it just writes a MOV]

I don't know that author, so I don't take that explanation as infallible,
but I recognize the authors in the thread it points to:
  http://compgroups.net/comp.arch/if-it-were-easy/2993157
and I'm pretty sure that they know what they're talking about, and that
they are saying memcpy() is the One True Way. If so, then the use of
memcpy() isn't an implementation detail because there is no alternative.

The difficulty I have with "buffer" is that, to me at least, it suggests
that a separate auxiliary region of storage is set up temporarily, e.g.,
as in HEAD where I do this:

        char z[sizeof(T)];             // allocate a buffer
        is.read(z, sizeof(T));         // move src --> buf
        t = *reinterpret_cast<T*>(z);  // move buf --> dst
        LMI_ASSERT(invalid != t);
        return t;                      // buffer is destroyed

whereas in other places no auxiliary storage is used, e.g.:

-           char* p = 54 + index_record;
-           boost::int32_t z = *reinterpret_cast<boost::int32_t*>(p);
+           boost::int32_t z;
+           std::memcpy(&z, 54 + index_record, sizeof(z));

where the source '54 + index_record' is moved directly into the
destination 'z' with no intervening buffer--which is also the case
in the above patch:
            int z = from_buffer_cast_as<boost::int32_t, int>(p);
For these reasons, isn't 'memcpy_cast' actually better?

I tried to come up with another name that describes *what* it does,
rather than *how* it works. "type_punning_cast" seems like an awful name
because it describes what it does *not* do. I like "punny_cast", playing
on the usual retort to an awful pun ("That's not punny!", a pun on "funny")
but perhaps that's too precious. What do you think of "convert_cast" or
"type_conversion_cast"?

> And this name also
> allows to have the symmetric to_buffer_cast() which could be used in
> md5.cpp code if you think it's worth it (rather than "raw" memcpy).

My objection to writing actual memcpy() calls in line is that it impairs
clarity. That objection doesn't apply to 'md5.cpp', which isn't designed
to be read, understood, or actively maintained. Inserting memcpy() calls
there is fine with me--it doesn't change that code's incomprehensibility.

I don't understand how to_buffer_cast and from_buffer_cast would differ.
Maybe that's because I'm think of memcpy(), whose two void* arguments are
symmetric. In your "from_buffer_cast_as" above, I understand the "_as",
but not the "from_".

>  I've also had to add a separate from_buffer_cast_as() because I stupidly
> forgot that default values for function template arguments were only valid
> in C++11 and so couldn't be used in lmi code.
> 
>  This being said, I think the code above is not too bad and it definitely
> looks better than my initial version, but I'll wait for your approval
> before finalising the patch.

Yes, please proceed. I like this very much.

> I'd also appreciate if you could tell me where
> would you prefer these new functions to go. It seems a bit of an overkill
> to add a separate header just for them, but I don't see where else to put
> them.

A separate header, please.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]