On Mon, 15 May 2023 at 15:37, Richard Henderson
<richard.henderson@linaro.org> wrote:
Hosts using Intel and AMD AVX cpus are quite common.
Add fast paths through ldst_atomicity using this.
Only enable with CONFIG_INT128; some older clang versions do not
support __int128_t, and the inline assembly won't work on structures.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
accel/tcg/ldst_atomicity.c.inc | 76 +++++++++++++++++++++++++++-------
1 file changed, 60 insertions(+), 16 deletions(-)
diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index dd387c9bdd..69c1c61997 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -35,6 +35,14 @@
#if defined(CONFIG_ATOMIC128)
# define HAVE_al16_fast true
+#elif defined(CONFIG_TCG_INTERPRETER)
+/*
+ * FIXME: host specific detection for this is in tcg/$host/,
+ * but we're using tcg/tci/ instead.
+ */
+# define HAVE_al16_fast false
+#elif defined(__x86_64__) && defined(CONFIG_INT128)
+# define HAVE_al16_fast likely(have_atomic16)
#else
# define HAVE_al16_fast false
#endif
@@ -178,6 +186,12 @@ load_atomic16(void *pv)
r.u = qatomic_read__nocheck(p);
return r.s;
+#elif defined(__x86_64__) && defined(CONFIG_INT128)
+ Int128Alias r;
+
+ /* Via HAVE_al16_fast, have_atomic16 is true. */
+ asm("vmovdqa %1, %0" : "=x" (r.u) : "m" (*(Int128 *)pv));
+ return r.s;
This is a compile-time check, so why if we can do
16-byte atomic loads would CONFIG_ATOMIC128 not be set?