|
From: | Richard Henderson |
Subject: | Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb |
Date: | Thu, 15 Sep 2022 07:48:24 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 |
On 9/14/22 23:59, Paolo Bonzini wrote:
On Tue, Sep 13, 2022 at 10:17 AM Richard Henderson <richard.henderson@linaro.org> wrote:On 9/12/22 00:04, Paolo Bonzini wrote:+ while (vec_len > 8) { + vec_len -= 8; + tcg_gen_shli_tl(s->T0, s->T0, 8); + tcg_gen_ld8u_tl(t, cpu_env, offsetof(CPUX86State, xmm_t0.ZMM_B(vec_len - 1))); + tcg_gen_or_tl(s->T0, s->T0, t); }The shl + or is deposit, for those hosts that have it, and will be re-expanded to shl + or for those that don't: tcg_gen_ld8u_tl(t, ...); tcg_gen_deposit_tl(s->T0, t, s->T0, 8, TARGET_LONG_BITS - 8);What you get from that is an shl(t, 56) followed by extract2 (i.e. SHRD). Yeah there are targets with a native deposit (x86 itself could add PDEP/PEXT support I guess) but I find it hard to believe that it outperforms a simple shl + or.
Perhaps the shl+shrd (or shrd+rol if the deposit is slightly different) is over-cleverness on my part in the expansion, and pdep requires a constant mask.
But for other hosts, deposit is the same cost as shift. r~
[Prev in Thread] | Current Thread | [Next in Thread] |