qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] include/qemu/bswap.h: using atomic memory load/store for wor


From: Peter Maydell
Subject: Re: [PATCH] include/qemu/bswap.h: using atomic memory load/store for word access
Date: Mon, 17 May 2021 10:23:46 +0100

On Mon, 17 May 2021 at 03:54, Bibo Mao <maobibo@loongson.cn> wrote:
>
> virtio ring buffer has lockless ring buffer scheme. When guest vcpu
> reads the memory, qemu io thread may is writing the same address.
> It requiires atomic operation in qemu side, __builtin_memcpy may
> read byte-by-byte.
>
> This patch uses fix this, however it may bring negative performance
> effect on system which does not support hw aligned memory access.
>
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
>  /*
> - * Any compiler worth its salt will turn these memcpy into native unaligned
> - * operations.  Thus we don't need to play games with packed attributes, or
> - * inline byte-by-byte stores.
> - * Some compilation environments (eg some fortify-source implementations)
> - * may intercept memcpy() in a way that defeats the compiler optimization,
> - * though, so we use __builtin_memcpy() to give ourselves the best chance
> - * of good performance.
> + * Some driver using lockless ring buffer like virtio ring requires that
> + * it should be atomic, since guest vcpu thread is reading the memory.
> + * It may bring out negative performance effect for architectures which
> + * do not support hw memory aligned access like mips, if ptr is not word
> + * alignment.
>   */
>
>  static inline int lduw_he_p(const void *ptr)
>  {
> -    uint16_t r;
> -    __builtin_memcpy(&r, ptr, sizeof(r));
> -    return r;
> +    return *(uint16_t *)ptr;

We rely on these functions handling unaligned pointers, as
the comment you delete notes. This change will not merely
"bring negative performance effect", it will cause crashes
on hosts like SPARC.

(Your change makes these functions have undefined behaviour in C,
which is why it will cause crashes on some systems.)

We do need to add support in various bits of the memory subsystem
for "this really does have to be atomic" (notably so that we can
do atomic read-modify-write actions when doing page table walks),
but that has to be done by adding extra APIs as necessary. And in
some places we already *have* those functions.

Can you describe the problem you're trying to solve in greater
detail? Notably:
 * what are the stacktraces for the accesses that end up in these
   functions ? (ie what is the top level function call or set of
   function calls that need to have must-be-atomic variants?)
 * can we guarantee that the pointers really are aligned there?

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]