Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb

From:	Richard Henderson
Subject:	Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb
Date:	Thu, 15 Sep 2022 07:48:24 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0

On 9/14/22 23:59, Paolo Bonzini wrote:

On Tue, Sep 13, 2022 at 10:17 AM Richard Henderson
<richard.henderson@linaro.org> wrote:


On 9/12/22 00:04, Paolo Bonzini wrote:

+    while (vec_len > 8) {
+        vec_len -= 8;
+        tcg_gen_shli_tl(s->T0, s->T0, 8);
+        tcg_gen_ld8u_tl(t, cpu_env, offsetof(CPUX86State, xmm_t0.ZMM_B(vec_len 
- 1)));
+        tcg_gen_or_tl(s->T0, s->T0, t);
       }


The shl + or is deposit, for those hosts that have it,
and will be re-expanded to shl + or for those that don't:

      tcg_gen_ld8u_tl(t, ...);
      tcg_gen_deposit_tl(s->T0, t, s->T0, 8, TARGET_LONG_BITS - 8);


What you get from that is an shl(t, 56) followed by extract2 (i.e.
SHRD). Yeah there are targets with a native deposit (x86 itself could
add PDEP/PEXT support I guess) but I find it hard to believe that it
outperforms a simple shl + or.

Perhaps the shl+shrd (or shrd+rol if the deposit is slightly different) is over-clevernesson my part in the expansion, and pdep requires a constant mask.


But for other hosts, deposit is the same cost as shift.


r~

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 23/37] target/i386: reimplement 0x0f 0x78-0x7f, add AVX, (continued)
- [PATCH 23/37] target/i386: reimplement 0x0f 0x78-0x7f, add AVX, Paolo Bonzini, 2022/09/11
  - Re: [PATCH 23/37] target/i386: reimplement 0x0f 0x78-0x7f, add AVX, Richard Henderson, 2022/09/12
    - Re: [PATCH 23/37] target/i386: reimplement 0x0f 0x78-0x7f, add AVX, Paolo Bonzini, 2022/09/14
- [PATCH 25/37] target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX, Paolo Bonzini, 2022/09/11
  - Re: [PATCH 25/37] target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX, Richard Henderson, 2022/09/12
- [PATCH 24/37] target/i386: reimplement 0x0f 0x70-0x77, add AVX, Paolo Bonzini, 2022/09/11
  - Re: [PATCH 24/37] target/i386: reimplement 0x0f 0x70-0x77, add AVX, Richard Henderson, 2022/09/12
- [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb, Paolo Bonzini, 2022/09/11
  - Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb, Richard Henderson, 2022/09/13
    - Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb, Paolo Bonzini, 2022/09/14
    - Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb, Richard Henderson <=
- [PATCH 26/37] target/i386: reimplement 0x0f 0x3a, add AVX, Paolo Bonzini, 2022/09/11
  - Re: [PATCH 26/37] target/i386: reimplement 0x0f 0x3a, add AVX, Richard Henderson, 2022/09/12
- [PATCH 28/37] target/i386: reimplement 0x0f 0x38, add AVX, Paolo Bonzini, 2022/09/11
  - Re: [PATCH 28/37] target/i386: reimplement 0x0f 0x38, add AVX, Richard Henderson, 2022/09/13
    - Re: [PATCH 28/37] target/i386: reimplement 0x0f 0x38, add AVX, Paolo Bonzini, 2022/09/14
    - Re: [PATCH 28/37] target/i386: reimplement 0x0f 0x38, add AVX, Richard Henderson, 2022/09/15
- [PATCH 29/37] target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX, Paolo Bonzini, 2022/09/11
  - Re: [PATCH 29/37] target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX, Richard Henderson, 2022/09/13
- [PATCH 31/37] target/i386: reimplement 0x0f 0x28-0x2f, add AVX, Paolo Bonzini, 2022/09/11
  - Re: [PATCH 31/37] target/i386: reimplement 0x0f 0x28-0x2f, add AVX, Richard Henderson, 2022/09/13

Prev by Date: Re: [PATCH 1/1] s390x/tcg: Fix opcode for lzrf
Next by Date: Re: [PATCH 28/37] target/i386: reimplement 0x0f 0x38, add AVX
Previous by thread: Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb
Next by thread: [PATCH 26/37] target/i386: reimplement 0x0f 0x3a, add AVX
Index(es):
- Date
- Thread