qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [PATCH v2 10/14] target/arm/kvm64: Add kvm_a


From: Andrew Jones
Subject: Re: [Qemu-arm] [Qemu-devel] [PATCH v2 10/14] target/arm/kvm64: Add kvm_arch_get/put_sve
Date: Wed, 17 Jul 2019 11:35:53 +0200
User-agent: NeoMutt/20180716

On Wed, Jun 26, 2019 at 05:22:34PM +0200, Richard Henderson wrote:
> On 6/21/19 6:34 PM, Andrew Jones wrote:
> > +/*
> > + * If ARM_MAX_VQ is increased to be greater than 16, then we can no
> > + * longer hard code slices to 1 in kvm_arch_put/get_sve().
> > + */
> > +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> 
> This seems easy to fix, or simply drop the slices entirely for now, as
> otherwise they are a teeny bit confusing.

I can do that, but as I replied down thread, I sort of like it this way
for documentation purposes. Anyway, I don't have a strong opinion here,
so I'm happy to make reviewers happy :-)

> 
> It's a shame that these slices exist at all.  It seems like the kernel could
> use the negotiated max sve size to grab the data all at once.
> 
> > +        for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) {
> > +            uint64_t *q = aa64_vfp_qreg(env, n);
> > +#ifdef HOST_WORDS_BIGENDIAN
> > +            uint64_t d[ARM_MAX_VQ * 2];
> > +            int j;
> > +            for (j = 0; j < cpu->sve_max_vq * 2; j++) {
> > +                d[j] = bswap64(q[j]);
> > +            }
> > +            reg.addr = (uintptr_t)d;
> > +#else
> > +            reg.addr = (uintptr_t)q;
> > +#endif
> > +            reg.id = KVM_REG_ARM64_SVE_ZREG(n, i);
> > +            ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
> 
> It might be worth splitting this...
> 
> > +        for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; n++) {
> > +            uint64_t *q = &env->vfp.pregs[n].p[0];
> > +#ifdef HOST_WORDS_BIGENDIAN
> > +            uint64_t d[ARM_MAX_VQ * 2 / 8];
> > +            int j;
> > +            for (j = 0; j < cpu->sve_max_vq * 2 / 8; j++) {
> > +                d[j] = bswap64(q[j]);
> > +            }
> > +            reg.addr = (uintptr_t)d;
> > +#else
> > +            reg.addr = (uintptr_t)q;
> > +#endif
> > +            reg.id = KVM_REG_ARM64_SVE_PREG(n, i);
> > +            ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
> 
> ... and this (unified w/ reg + size parameters?) to a function because ...
> 
> > +        reg.addr = (uintptr_t)&env->vfp.pregs[FFR_PRED_NUM].p[0];
> > +        reg.id = KVM_REG_ARM64_SVE_FFR(i);
> > +        ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
> 
> ... you forgot to apply the bswap here.

Ah, thanks for catching this. I'll fix it for v3, possibly with the
factoring, as you suggest.

> 
> Likewise for the other direction.
> 
> 
> r~
> 
> 
> PS: It's also tempting to drop the ifdefs and, since we know the host supports
> sve instructions, and that the host supports sve_max_vq, do the reformatting 
> as
> 
>     uint64_t scratch[ARM_MAX_VQ * 2];
>     asm("whilelo  p0.d, xzr, %2\n\t"
>         "ld1d     z0.d, p0/z [%1]\n\t"
>         "str      z0, [%0]"
>         : "=Q"(scratch)
>         : "Q"(*aa64_vfp_qreg(env, n)),
>           "r"(cpu->sve_max_vq)
>         : "p0", "v0");

This is nice, but as we don't have any other asm's in this file, I'm
inclined to leave it with the loops/swaps until we can use a builtin,
as you suggest below.

> 
> PPS: Ideally, this would be further cleaned up with acle builtins, but those
> are still under development for GCC.
> 

Thanks,
drew



reply via email to

[Prev in Thread] Current Thread [Next in Thread]