qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [PATCH v3 2/5] target/arm: optimize rev16() using extract


From: Richard Henderson
Subject: Re: [Qemu-arm] [PATCH v3 2/5] target/arm: optimize rev16() using extract op
Date: Fri, 12 May 2017 12:38:36 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0

On 05/12/2017 12:22 PM, Aurelien Jarno wrote:
On 2017-05-12 12:05, Richard Henderson wrote:
On 05/12/2017 11:21 AM, Aurelien Jarno wrote:
+    uint64_t mask1 = sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff;
+    uint64_t mask2 = sf ? 0xff00ff00ff00ff00ull : 0xff00ff00;
+
+    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
+    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, mask1);
+    tcg_gen_shli_i64(tcg_rd, tcg_rn, 8);
+    tcg_gen_andi_i64(tcg_rd, tcg_rd, mask2);

It would probably be better to use a single mask, since they're not free to
instantiate in a register.  So e.g.

   TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
   tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
   tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
   tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
   tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);

Indeed that improves things a bit for sf=1. For sf=0 though the
constant is never loaded into a register, it is passed to the and
instructions as an immediate.

For x86 (and sometimes s390) it isn't, but it certainly would be for all other hosts.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]