qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 6/6] util/bufferiszero: improve scalar variant


From: Richard Henderson
Subject: Re: [PATCH v3 6/6] util/bufferiszero: improve scalar variant
Date: Wed, 7 Feb 2024 08:46:46 +1000
User-agent: Mozilla Thunderbird

On 2/7/24 08:34, Richard Henderson wrote:
On 2/7/24 06:48, Alexander Monakov wrote:
-        /* Otherwise, use the unaligned memory access functions to
-           handle the beginning and end of the buffer, with a couple
+        /* Use unaligned memory access functions to handle
+           the beginning and end of the buffer, with a couple
             of loops handling the middle aligned section.  */
-        uint64_t t = ldq_he_p(buf);
-        const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8);
-        const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8);
+        uint64_t t = ldq_he_p(buf) | ldq_he_p(buf + len - 8);
+        typedef uint64_t uint64_a __attribute__((may_alias));
+        const uint64_a *p = (void *)(((uintptr_t)buf + 8) & -8);
+        const uint64_a *e = (void *)(((uintptr_t)buf + len - 1) & -8);
You appear to be optimizing this routine for x86, which is not the primary 
consumer.

This is going to perform very poorly on hosts that do not support unaligned accesses (e.g. Sparc and some RISC-V).
I beg your pardon, I mis-read this.  You're only replacing the byte loops, which will be 
more-or-less identical, modulo unrolling, when unaligned access is not supported.  But 
will be much improved if some unaligned access support is available (e.g. MIPS LWL+LWR).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~




reply via email to

[Prev in Thread] Current Thread [Next in Thread]