On Wed, 3 May 2023 at 08:18, Richard Henderson
<richard.henderson@linaro.org> wrote:
Use the fpu to perform 64-bit loads and stores.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
@@ -2091,7 +2095,20 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg
datalo, TCGReg datahi,
datalo = datahi;
datahi = t;
}
- if (h.base == datalo || h.index == datalo) {
+ if (h.atom == MO_64) {
+ /*
+ * Atomicity requires that we use use a single 8-byte load.
+ * For simplicity and code size, always use the FPU for this.
+ * Similar insns using SSE/AVX are merely larger.
I'm surprised there's no performance penalty for throwing old-school
FPU insns into what is presumably otherwise code that's only
using modern SSE.