qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 11/54] accel/tcg: Add aarch64 specific support in ldst_ato


From: Richard Henderson
Subject: Re: [PATCH v5 11/54] accel/tcg: Add aarch64 specific support in ldst_atomicity
Date: Tue, 16 May 2023 07:04:39 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

On 5/16/23 06:56, Peter Maydell wrote:
On Tue, 16 May 2023 at 14:51, Richard Henderson
<richard.henderson@linaro.org> wrote:

On 5/16/23 06:29, Peter Maydell wrote:
On Mon, 15 May 2023 at 15:38, Richard Henderson
<richard.henderson@linaro.org> wrote:

We have code in atomic128.h noting that through GCC 8, there
was no support for atomic operations on __uint128.  This has
been fixed in GCC 10.  But we can still improve over any
basic compare-and-swap loop using the ldxp/stxp instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
   accel/tcg/ldst_atomicity.c.inc | 60 ++++++++++++++++++++++++++++++++--
   1 file changed, 57 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 69c1c61997..c3b2b35823 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -263,7 +263,22 @@ static Int128 load_atomic16_or_exit(CPUArchState *env, 
uintptr_t ra, void *pv)
        * In system mode all guest pages are writable, and for user-only
        * we have just checked writability.  Try cmpxchg.
        */
-#if defined(CONFIG_CMPXCHG128)
+#if defined(__aarch64__)
+    /* We can do better than cmpxchg for AArch64.  */
+    {
+        uint64_t l, h;
+        uint32_t fail;
+
+        /* The load must be paired with the store to guarantee not tearing. */
+        asm("0: ldxp %0, %1, %3\n\t"
+            "stxp %w2, %0, %1, %3\n\t"
+            "cbnz %w2, 0b"
+            : "=&r"(l), "=&r"(h), "=&r"(fail) : "Q"(*p));
+
+        qemu_build_assert(!HOST_BIG_ENDIAN);
+        return int128_make128(l, h);
+    }

The compiler (well, clang 11, anyway) seems able to generate equivalent
code to this inline asm:

See above, where GCC 8 can do nothing, and that is still a supported compiler.

Yeah, but it'll work fine even without the explicit inline
asm, right?

No, GCC < 10 does not support __sync_* or __atomic_* on __uint128_t at all.

Is the performance difference critical enough to justify
an inline asm implementation that's only needed for older
compilers ?

Yes.  It's the difference between stop-the-world and not.


r~




reply via email to

[Prev in Thread] Current Thread [Next in Thread]