qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening form


From: Peter Maydell
Subject: Re: [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms)
Date: Wed, 9 Jun 2021 11:01:22 +0100

On Tue, 8 Jun 2021 at 22:33, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:57 AM, Peter Maydell wrote:
> > +static uint16_t mve_element_mask(CPUARMState *env)
> > +{
> > +    /*
> > +     * Return the mask of which elements in the MVE vector should be
> > +     * updated. This is a combination of multiple things:
> > +     *  (1) by default, we update every lane in the vector
> > +     *  (2) VPT predication stores its state in the VPR register;
> > +     *  (3) low-overhead-branch tail predication will mask out part
> > +     *      the vector on the final iteration of the loop
> > +     *  (4) if EPSR.ECI is set then we must execute only some beats
> > +     *      of the insn
> > +     * We combine all these into a 16-bit result with the same semantics
> > +     * as VPR.P0: 0 to mask the lane, 1 if it is active.
> > +     * 8-bit vector ops will look at all bits of the result;
> > +     * 16-bit ops will look at bits 0, 2, 4, ...;
> > +     * 32-bit ops will look at bits 0, 4, 8 and 12.
> > +     * Compare pseudocode GetCurInstrBeat(), though that only returns
> > +     * the 4-bit slice of the mask corresponding to a single beat.
> > +     */
> > +    uint16_t mask = extract32(env->v7m.vpr, R_V7M_VPR_P0_SHIFT,
> > +                              R_V7M_VPR_P0_LENGTH);
>
> Any reason you're not using FIELD_EX32 and and FIELD_DP32 so far in this file?

Just habit, really, I think.

> > +#define DO_VLDR(OP, ESIZE, LDTYPE, TYPE, H)                             \
> > +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
> > +    {                                                                   \
> > +        TYPE *d = vd;                                                   \
> > +        uint16_t mask = mve_element_mask(env);                          \
> > +        unsigned b, e;                                                  \
>
> esize is redundant with sizeof(type); perhaps just make it a local variable?
>
> > diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
> > index c54d5cb7305..e8bb2372ad9 100644
> > --- a/target/arm/translate-mve.c
> > +++ b/target/arm/translate-mve.c
> > @@ -1,6 +1,6 @@
> >   /*
> >    *  ARM translation: M-profile MVE instructions
> > -
> > + *
> >    *  Copyright (c) 2021 Linaro, Ltd.
>
> Is this just diff silliness?  I see that it has decided that helper-mve.h is a
> rename from translate-mve.c...

Not sure. I fixed at least one similar issue before sending, I guess
I missed this one.

> > +static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
> > +{
> > +    TCGv_i32 addr;
> > +    uint32_t offset;
> > +    TCGv_ptr qreg;
> > +
> > +    if (!dc_isar_feature(aa32_mve, s)) {
> > +        return false;
> > +    }
> > +
> > +    if (a->qd > 7 || !fn) {
> > +        return false;
> > +    }
>
> It's a funny old decode,
>
>    if D then UNDEFINED.
>    d = D:Qd,
>
> Is the spec forward looking to more than 7 Q registers?
> It's tempting to just drop the D:Qd from the decode...

I don't know, but looking at the decode it certainly seems
like the door is being left open to Q8..Q15. Other signs of
this include the existence of the VFPSmallRegisterBank()
function and the way that VLLDM and VLSTM have T2 encodings
whose only difference from the T1 encodings is that you can
specify registers up to D31. Decoding D:Qd and then doing the
range check seemed more in line with the spirit of this, though
of course leaving the D=1 UNDEF to decodetree works too.
(Some insns really do only use 3 bit register fields without
the extra D bit, so if we left all the fields 3 bit and later needed
to handle Q8..Q15 we'd have to go through everything to work out
which type of insn it was.)

> > +static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
> > +{
> > +    MVEGenLdStFn *ldfns[] = {
>
> static MVEGenLdStFn * const ldfns
>
> > +    MVEGenLdStFn *stfns[] = {
>
> Likewise, though...
>
> > +    return do_ldst(s, a, a->l ? ldfns[a->size] : stfns[a->size]);
>
> ... just put em together into a two-dimensional array, with a->l as the second
> index?

Yeah. (I was being a bit lazy because I can never remember which
way round the initializers go in a 2D array :-)

-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]