qemu-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-commits] [qemu/qemu] 6b8b62: cputlb: Make store_helper less fragil


From: Peter Maydell
Subject: [Qemu-commits] [qemu/qemu] 6b8b62: cputlb: Make store_helper less fragile to compiler...
Date: Sun, 06 Sep 2020 06:15:28 -0700

  Branch: refs/heads/master
  Home:   https://github.com/qemu/qemu
  Commit: 6b8b622e87e2cb4b22113f2bdebf18c78f5905ee
      
https://github.com/qemu/qemu/commit/6b8b622e87e2cb4b22113f2bdebf18c78f5905ee
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   2020-09-03 (Thu, 03 Sep 2020)

  Changed paths:
    M accel/tcg/cputlb.c

  Log Message:
  -----------
  cputlb: Make store_helper less fragile to compiler optimizations

This has no functional change.

The current function structure is:

    inline QEMU_ALWAYSINLINE
    store_memop() {
        switch () {
            ...
        default:
            qemu_build_not_reached();
        }
    }
    inline QEMU_ALWAYSINLINE
    store_helper() {
        ...
        if (span_two_pages_or_io) {
            ...
            helper_ret_stb_mmu();
        }
        store_memop();
    }
    helper_ret_stb_mmu() {
        store_helper();
    }

Whereas GCC will generate an error at compile-time when an always_inline
function is not inlined, Clang does not.  Nor does Clang prioritize the
inlining of always_inline functions.  Both of these are arguably bugs.

Both `store_memop` and `store_helper` need to be inlined and allow
constant propogations to eliminate the `qemu_build_not_reached` call.

However, if the compiler instead chooses to inline helper_ret_stb_mmu
into store_helper, then store_helper is now self-recursive and the
compiler is no longer able to propagate the constant in the same way.

This does not produce at current QEMU head, but was reproducible
at v4.2.0 with `clang-10 -O2 -fexperimental-new-pass-manager`.

The inline recursion problem can be fixed solely by marking
helper_ret_stb_mmu as noinline, so the compiler does not make an
incorrect decision about which functions to inline.

In addition, extract store_helper_unaligned as a noinline subroutine
that can be shared by all of the helpers.  This saves about 6k code
size in an optimized x86_64 build.

Reported-by: Shu-Chun Weng <scw@google.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


  Commit: e7e8f33fb603c3bfa0479d7d924f2ad676a84317
      
https://github.com/qemu/qemu/commit/e7e8f33fb603c3bfa0479d7d924f2ad676a84317
  Author: Stephen Long <steplong@quicinc.com>
  Date:   2020-09-03 (Thu, 03 Sep 2020)

  Changed paths:
    M tcg/tcg-op-gvec.c

  Log Message:
  -----------
  tcg: Fix tcg gen for vectorized absolute value

The fallback inline expansion for vectorized absolute value,
when the host doesn't support such an insn was flawed.

E.g. when a vector of bytes has all elements negative, mask
will be 0xffff_ffff_ffff_ffff.  Subtracting mask only adds 1
to the low element instead of all elements becase -mask is 1
and not 0x0101_0101_0101_0101.

Signed-off-by: Stephen Long <steplong@quicinc.com>
Message-Id: <20200813161818.190-1-steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


  Commit: 4ca3d09cd9b2046984966ef430cca4572ae0a925
      
https://github.com/qemu/qemu/commit/4ca3d09cd9b2046984966ef430cca4572ae0a925
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   2020-09-03 (Thu, 03 Sep 2020)

  Changed paths:
    M softmmu/cpus.c

  Log Message:
  -----------
  softmmu/cpus: Only set parallel_cpus for SMP

Do not set parallel_cpus if there is only one cpu instantiated.
This will allow tcg to use serial code to implement atomics.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


  Commit: 6a17646176e011ddc463a2870a64c7aaccfe9c50
      
https://github.com/qemu/qemu/commit/6a17646176e011ddc463a2870a64c7aaccfe9c50
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   2020-09-03 (Thu, 03 Sep 2020)

  Changed paths:
    M tcg/tcg-op-gvec.c

  Log Message:
  -----------
  tcg: Eliminate one store for in-place 128-bit dup_mem

Do not store back to the exact memory from which we just loaded.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


  Commit: fe4b0b5bfa96c38ad1cad0689a86cca9f307e353
      
https://github.com/qemu/qemu/commit/fe4b0b5bfa96c38ad1cad0689a86cca9f307e353
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   2020-09-03 (Thu, 03 Sep 2020)

  Changed paths:
    M tcg/tcg-op-gvec.c

  Log Message:
  -----------
  tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem

We already support duplication of 128-bit blocks.  This extends
that support to 256-bit blocks.  This will be needed by SVE2.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


  Commit: 227de21ed0759e275a469394af72c999d0134bb5
      
https://github.com/qemu/qemu/commit/227de21ed0759e275a469394af72c999d0134bb5
  Author: Peter Maydell <peter.maydell@linaro.org>
  Date:   2020-09-05 (Sat, 05 Sep 2020)

  Changed paths:
    M accel/tcg/cputlb.c
    M softmmu/cpus.c
    M tcg/tcg-op-gvec.c

  Log Message:
  -----------
  Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20200903' into staging

Improve inlining in cputlb.c.
Fix vector abs fallback.
Only set parallel_cpus for SMP.
Add vector dupm for 256-bit elements.

# gpg: Signature made Thu 03 Sep 2020 22:38:25 BST
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" 
[full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20200903:
  tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
  tcg: Eliminate one store for in-place 128-bit dup_mem
  softmmu/cpus: Only set parallel_cpus for SMP
  tcg: Fix tcg gen for vectorized absolute value
  cputlb: Make store_helper less fragile to compiler optimizations

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>


Compare: https://github.com/qemu/qemu/compare/8ca019b9c9ff...227de21ed075



reply via email to

[Prev in Thread] Current Thread [Next in Thread]