[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH 26/32] ppc: Speed up dcbz
From: |
David Gibson |
Subject: |
Re: [Qemu-ppc] [PATCH 26/32] ppc: Speed up dcbz |
Date: |
Wed, 27 Jul 2016 12:36:54 +1000 |
User-agent: |
Mutt/1.6.2 (2016-07-01) |
On Wed, Jul 27, 2016 at 08:21:20AM +1000, Benjamin Herrenschmidt wrote:
> Use tlb_vaddr_to_host to do a fast path single translate for
> the whole cache line. Also make the reservation check match
> the entire range.
>
> Signed-off-by: Benjamin Herrenschmidt <address@hidden>
> ---
> target-ppc/mem_helper.c | 46 +++++++++++++++++++++++++---------------------
> target-ppc/translate.c | 11 ++++-------
> 2 files changed, 29 insertions(+), 28 deletions(-)
>
> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
> index 92a594c..6548715 100644
> --- a/target-ppc/mem_helper.c
> +++ b/target-ppc/mem_helper.c
> @@ -141,35 +141,39 @@ void helper_stsw(CPUPPCState *env, target_ulong addr,
> uint32_t nb,
> }
> }
>
> -static void do_dcbz(CPUPPCState *env, target_ulong addr, int
> dcache_line_size,
> - uintptr_t raddr)
> +void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
> {
> - int i;
> -
> - addr &= ~(dcache_line_size - 1);
> - for (i = 0; i < dcache_line_size; i += 4) {
> - cpu_stl_data_ra(env, addr + i, 0, raddr);
> - }
> - if (env->reserve_addr == addr) {
> - env->reserve_addr = (target_ulong)-1ULL;
> - }
> -}
> -
> -void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t is_dcbzl)
> -{
> - int dcbz_size = env->dcache_line_size;
> + target_ulong mask, dcbz_size = env->dcache_line_size;
> + uint32_t i;
> + void *haddr;
>
> #if defined(TARGET_PPC64)
> - if (!is_dcbzl &&
> - (env->excp_model == POWERPC_EXCP_970) &&
> - ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
> + /* Check for dcbz vs dcbzl on 970 */
> + if (env->excp_model == POWERPC_EXCP_970 &&
> + !(opcode & 0x00200000) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) ==
> 1) {
> dcbz_size = 32;
> }
> #endif
>
> - /* XXX add e500mc support */
> + /* Align address */
> + mask = ~(dcbz_size - 1);
> + addr &= mask;
> +
> + /* Check reservation */
> + if ((env->reserve_addr & mask) == (addr & mask)) {
> + env->reserve_addr = (target_ulong)-1ULL;
> + }
>
> - do_dcbz(env, addr, dcbz_size, GETPC());
> + /* Try fast path translate */
> + haddr = tlb_vaddr_to_host(env, addr, MMU_DATA_STORE, env->dmmu_idx);
It worries me slightly that this doesn't take any length to verify. I
guess it's ok in practice, because memory blocks will always be at
least cache line size aligned.
> + if (haddr) {
> + memset(haddr, 0, dcbz_size);
> + } else {
> + /* Slow path */
> + for (i = 0; i < dcbz_size; i += 8) {
> + cpu_stq_data_ra(env, addr + i, 0, GETPC());
> + }
> + }
> }
>
> void helper_icbi(CPUPPCState *env, target_ulong addr)
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 57a891b..5288e02 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -3851,18 +3851,15 @@ static void gen_dcbtls(DisasContext *ctx)
> static void gen_dcbz(DisasContext *ctx)
> {
> TCGv tcgv_addr;
> - TCGv_i32 tcgv_is_dcbzl;
> - int is_dcbzl = ctx->opcode & 0x00200000 ? 1 : 0;
> + TCGv_i32 tcgv_op;
>
> gen_set_access_type(ctx, ACCESS_CACHE);
> tcgv_addr = tcg_temp_new();
> - tcgv_is_dcbzl = tcg_const_i32(is_dcbzl);
> -
> + tcgv_op = tcg_const_i32(ctx->opcode & 0x03FF000);
> gen_addr_reg_index(ctx, tcgv_addr);
> - gen_helper_dcbz(cpu_env, tcgv_addr, tcgv_is_dcbzl);
> -
> + gen_helper_dcbz(cpu_env, tcgv_addr, tcgv_op);
> tcg_temp_free(tcgv_addr);
> - tcg_temp_free_i32(tcgv_is_dcbzl);
> + tcg_temp_free_i32(tcgv_op);
> }
>
> /* dst / dstt */
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
- [Qemu-ppc] [PATCH 24/32] ppc: Make alignment exceptions suck less, (continued)
- [Qemu-ppc] [PATCH 24/32] ppc: Make alignment exceptions suck less, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 27/32] ppc: Fix CFAR updates, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 31/32] ppc: load/store multiple and string insns don't do LE, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 29/32] ppc: Don't set access_type on all load/stores on hash64, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 30/32] ppc: Use a helper to generate "LE unsupported" alignment interrupts, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 22/32] ppc: Don't update NIP if not taking alignment exceptions, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 26/32] ppc: Speed up dcbz, Benjamin Herrenschmidt, 2016/07/26
- Re: [Qemu-ppc] [PATCH 26/32] ppc: Speed up dcbz,
David Gibson <=
- [Qemu-ppc] [PATCH 32/32] ppc: Speed up load/store multiple, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 14/32] ppc: Don't update NIP in lmw/stmw/icbi, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 23/32] ppc: Don't update NIP in dcbz and lscbx, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 17/32] ppc: Fix source NIP on SLB related interrupts, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-ppc] [PATCH 25/32] ppc: Handle unconditional (always/never) traps at translation time, Benjamin Herrenschmidt, 2016/07/26