qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: chacha20-s390 broken in 8.2.0 in TCG on s390x


From: Philippe Mathieu-Daudé
Subject: Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
Date: Wed, 3 Jan 2024 15:37:08 +0100
User-agent: Mozilla Thunderbird

On 3/1/24 15:01, Philippe Mathieu-Daudé wrote:
On 3/1/24 12:53, Philippe Mathieu-Daudé wrote:
Hi Richard,

On 3/1/24 09:54, Michael Tokarev wrote:
03.01.2024 03:22, Richard Henderson wrote:
On 12/22/23 01:51, Michael Tokarev wrote:
...
git bisect points to this commit:

commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Wed Aug 23 23:04:24 2023 -0700

     tcg/optimize: Optimize env memory operations

So far, this seems to work on amd64 host, but fails on s390x host -
where this has been observed so far.  Maybe it also fails in some
other combinations too, I don't yet know.  Just finished bisecting
it on s390x.

I haven't been able to build a reproducer for this.
Have you an image or kernel you can share?

Sure.

Here's my actual testing "image": http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz

It contains vmlinuz and initrd - generated on a debian s390x system using standard
debian tools.

Actual command line I used when doing bisection:

  ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel vmlinuz -initrd initrd -snapshot


Reducing a bit further, it works when disabling rotli_vec opcode
(commit 22cb37b417 "tcg/s390x: Implement vector shift operations"):

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..5f147661e8 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
      case INDEX_op_orc_vec:
+        return 1;
      case INDEX_op_rotli_vec:
+        return TCG_TARGET_HAS_roti_vec;
      case INDEX_op_rotls_vec:
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index e69b0d2ddd..5c18146a40 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3];
  #define TCG_TARGET_HAS_abs_vec        1
-#define TCG_TARGET_HAS_roti_vec       1
+#define TCG_TARGET_HAS_roti_vec       0
  #define TCG_TARGET_HAS_rots_vec       1
---

Finally changing the constraints on op_rotli_vec seems to fix it:

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..b3456fe857 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_ld_vec:
     case INDEX_op_dupm_vec:
+    case INDEX_op_rotli_vec:
         return C_O1_I1(v, r);
     case INDEX_op_dup_vec:
         return C_O1_I1(v, vr);
     case INDEX_op_abs_vec:
     case INDEX_op_neg_vec:
     case INDEX_op_not_vec:
-    case INDEX_op_rotli_vec:
     case INDEX_op_sari_vec:
     case INDEX_op_shli_vec:
     case INDEX_op_shri_vec:
     case INDEX_op_s390_vuph_vec:
     case INDEX_op_s390_vupl_vec:
         return C_O1_I1(v, v);
---

But I'm outside of my comfort zone so not really sure what I'm doing...
(I was inspired by the "the instruction verll only allows immediates up
to 32 bits." comment from
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]