qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 29/31] target/ppc: Implement cfuged instruction


From: Matheus K. Ferst
Subject: Re: [PATCH v4 29/31] target/ppc: Implement cfuged instruction
Date: Thu, 13 May 2021 09:24:32 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1

On 13/05/2021 08:31, Richard Henderson wrote:
On 5/12/21 1:54 PM, matheus.ferst@eldorado.org.br wrote:
+    while (i) {
+        n = ctz64(mask);
+        if (n > i) {
+            n = i;
+        }
+
+        m = (1ll << n) - 1;
+        if (bit) {
+            right = ror64(right | (src & m), n);
+        } else {
+            left = ror64(left | (src & m), n);
+        }
+
+        src >>= n;
+        mask >>= n;
+        i -= n;
+        bit = !bit;
+        mask = ~mask;
+    }
+
+    if (bit) {
+        n = ctpop64(mask);
+    } else {
+        n = 64 - ctpop64(mask);
+    }
+
+    return left | (right >> n);
+}

This doesn't correspond to the algorithm presented in the manual.  Thus this requires lots of extra commentary.

I guess I see how you're trying to process blocks at a time, instead of single bits at a time.  But I don't think the merging of data into "right" and "left" looks right.  I would have expected

     right = (right << n) | (src & m);

and similarly for left.

It doesn't look like that the ctpop at the end is correct, given how mask has been modified.  I would have thought that

     n = ctpop64(orig_mask);
     return (left << n) | right;

would be the correct answer.

I could be wrong about the above, but that's what the missing commentary should have helped me understand.


It sure worth more comments. Yes, the idea is to process in blocks, and we negate the mask to avoid deciding between ctz and cto inside the loop. We use rotate instead of shift so it don't change the number of zeros and ones, and then we don't need orig_mask.

You'll find my test cases for cfuged and vcfuged on https://github.com/PPC64/qemu/blob/ferst-tcg-cfuged/tests/tcg/ppc64le/ . I got the same results by running them with this implementation and with the Power10 Functional Simulator.

+static bool trans_CFUGED(DisasContext *ctx, arg_X *a)
+{
+    REQUIRE_64BIT(ctx);
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+#if defined(TARGET_PPC64)
+    gen_helper_cfuged(cpu_gpr[a->ra], cpu_gpr[a->rt], cpu_gpr[a->rb]);
+#else
+    gen_invalid(ctx);
+#endif
+    return true;
+}

Given that this helper will also be used by vcfuged, there's no point in hiding it in a TARGET_PPC64 block, and thus you can drop the ifdefs.


r~


If I remove it, the build for ppc will fail, because cpu_gpr is declared as TCGv, and the helper uses i64 to match {get,set}_cpu_vsr{l,h}. REQUIRE_64BIT makes the helper call unreachable for ppc, but it's a runtime check. At build time, the compiler will check the types anyway, and give us an error.

--
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software Júnior
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]