[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/2] mips: Optimize jit_ctor() / jit_ctzr()
From: |
Paul Cercueil |
Subject: |
Re: [PATCH 2/2] mips: Optimize jit_ctor() / jit_ctzr() |
Date: |
Wed, 08 Mar 2023 23:20:16 +0000 |
Hi Paulo,
Le mercredi 08 mars 2023 à 00:33 +0000, Paul Cercueil a écrit :
> The jit_ctzr() can be performed with just 5 instructions and 2
> temporary
> registers. The jit_ctor() can be performed by just inversing all bits
> then calculating the ctzr().
>
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
> lib/jit_mips-cpu.c | 61 +++++++++++++++++++-------------------------
> --
> 1 file changed, 25 insertions(+), 36 deletions(-)
>
> diff --git a/lib/jit_mips-cpu.c b/lib/jit_mips-cpu.c
> index d71a5b5..7babc86 100644
> --- a/lib/jit_mips-cpu.c
> +++ b/lib/jit_mips-cpu.c
> @@ -1670,49 +1670,38 @@ _clzr(jit_state_t *_jit, jit_int32_t r0,
> jit_int32_t r1)
> static void
> _ctor(jit_state_t *_jit, jit_int32_t r0, jit_int32_t r1)
> {
> - if (jit_mips2_p()) {
> - if (jit_mips6_p()) {
> -#if __WORDSIZE == 32
> - BITSWAP(r0, r1);
> - bswapr_ui(r0, r0);
> - CLO_R6(r0, r0);
> -#else
> - DBITSWAP(r0, r1);
> - bswapr_ul(r0, r0);
> - DCLO_R6(r0, r0);
> -#endif
> - }
> - else {
> - fallback_bitswap(r0, r1);
> - clor(r0, r0);
> - }
> + if (jit_mips6_p()) {
> + rbitr(r0, r1);
> + clor(r0, r0);
> + }
> + else {
> + comr(r0, r1);
> + ctzr(r0, r0);
> }
> - else
> - fallback_cto(r0, r1);
> }
>
> static void
> _ctzr(jit_state_t *_jit, jit_int32_t r0, jit_int32_t r1)
> {
> - if (jit_mips2_p()) {
> - if (jit_mips6_p()) {
> -#if __WORDSIZE == 32
> - BITSWAP(r0, r1);
> - bswapr_ui(r0, r0);
> - CLZ_R6(r0, r0);
> -#else
> - DBITSWAP(r0, r1);
> - bswapr_ul(r0, r0);
> - DCLZ_R6(r0, r0);
> -#endif
> - }
> - else {
> - fallback_bitswap(r0, r1);
> - clzr(r0, r0);
> - }
> + if (jit_mips6_p()) {
> + rbitr(r0, r1);
> + clzr(r0, r0);
> + }
> + else {
> + jit_int32_t t0, t1;
> +
> + t0 = jit_get_reg(jit_class_gpr);
> + t1 = jit_get_reg(jit_class_gpr);
> +
> + negr(rn(t0), r1);
> + andr(rn(t0), rn(t0), r1);
> + clzr(r0, rn(t0));
> + xori(rn(t1), r0, __WORDSIZE - 1);
> + movnr(r0, rn(t1), rn(t0));
Thanks for merging.
I would just like to point out that this algorithm above is nothing
MIPS-specific, and other archs with a CLZ instruction (e.g. PPC) would
very much benefit from using this algorithm instead of the bit-reverse
one.
Cheers,
-Paul
> +
> + jit_unget_reg(t0);
> + jit_unget_reg(t1);
> }
> - else
> - fallback_ctz(r0, r1);
> }
>
> static void