[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-arm] [Qemu-devel] [PATCH v2 10/14] target/arm/kvm64: Add kvm_a
From: |
Andrew Jones |
Subject: |
Re: [Qemu-arm] [Qemu-devel] [PATCH v2 10/14] target/arm/kvm64: Add kvm_arch_get/put_sve |
Date: |
Wed, 17 Jul 2019 11:35:53 +0200 |
User-agent: |
NeoMutt/20180716 |
On Wed, Jun 26, 2019 at 05:22:34PM +0200, Richard Henderson wrote:
> On 6/21/19 6:34 PM, Andrew Jones wrote:
> > +/*
> > + * If ARM_MAX_VQ is increased to be greater than 16, then we can no
> > + * longer hard code slices to 1 in kvm_arch_put/get_sve().
> > + */
> > +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
>
> This seems easy to fix, or simply drop the slices entirely for now, as
> otherwise they are a teeny bit confusing.
I can do that, but as I replied down thread, I sort of like it this way
for documentation purposes. Anyway, I don't have a strong opinion here,
so I'm happy to make reviewers happy :-)
>
> It's a shame that these slices exist at all. It seems like the kernel could
> use the negotiated max sve size to grab the data all at once.
>
> > + for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) {
> > + uint64_t *q = aa64_vfp_qreg(env, n);
> > +#ifdef HOST_WORDS_BIGENDIAN
> > + uint64_t d[ARM_MAX_VQ * 2];
> > + int j;
> > + for (j = 0; j < cpu->sve_max_vq * 2; j++) {
> > + d[j] = bswap64(q[j]);
> > + }
> > + reg.addr = (uintptr_t)d;
> > +#else
> > + reg.addr = (uintptr_t)q;
> > +#endif
> > + reg.id = KVM_REG_ARM64_SVE_ZREG(n, i);
> > + ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
>
> It might be worth splitting this...
>
> > + for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; n++) {
> > + uint64_t *q = &env->vfp.pregs[n].p[0];
> > +#ifdef HOST_WORDS_BIGENDIAN
> > + uint64_t d[ARM_MAX_VQ * 2 / 8];
> > + int j;
> > + for (j = 0; j < cpu->sve_max_vq * 2 / 8; j++) {
> > + d[j] = bswap64(q[j]);
> > + }
> > + reg.addr = (uintptr_t)d;
> > +#else
> > + reg.addr = (uintptr_t)q;
> > +#endif
> > + reg.id = KVM_REG_ARM64_SVE_PREG(n, i);
> > + ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
>
> ... and this (unified w/ reg + size parameters?) to a function because ...
>
> > + reg.addr = (uintptr_t)&env->vfp.pregs[FFR_PRED_NUM].p[0];
> > + reg.id = KVM_REG_ARM64_SVE_FFR(i);
> > + ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
>
> ... you forgot to apply the bswap here.
Ah, thanks for catching this. I'll fix it for v3, possibly with the
factoring, as you suggest.
>
> Likewise for the other direction.
>
>
> r~
>
>
> PS: It's also tempting to drop the ifdefs and, since we know the host supports
> sve instructions, and that the host supports sve_max_vq, do the reformatting
> as
>
> uint64_t scratch[ARM_MAX_VQ * 2];
> asm("whilelo p0.d, xzr, %2\n\t"
> "ld1d z0.d, p0/z [%1]\n\t"
> "str z0, [%0]"
> : "=Q"(scratch)
> : "Q"(*aa64_vfp_qreg(env, n)),
> "r"(cpu->sve_max_vq)
> : "p0", "v0");
This is nice, but as we don't have any other asm's in this file, I'm
inclined to leave it with the loops/swaps until we can use a builtin,
as you suggest below.
>
> PPS: Ideally, this would be further cleaned up with acle builtins, but those
> are still under development for GCC.
>
Thanks,
drew