qemu-riscv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-riscv] [Qemu-devel] [PATCH v2 09/17] RISC-V: add vector extens


From: Richard Henderson
Subject: Re: [Qemu-riscv] [Qemu-devel] [PATCH v2 09/17] RISC-V: add vector extension integer instructions part2, bit/shift
Date: Thu, 12 Sep 2019 12:41:55 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 9/11/19 2:25 AM, liuzhiwei wrote:
> +void VECTOR_HELPER(vand_vv)(CPURISCVState *env, uint32_t vm, uint32_t rs1,
> +    uint32_t rs2, uint32_t rd)
> +{
> +    int i, j, vl;
> +    uint32_t lmul, width, src1, src2, dest, vlmax;
> +
> +    vl = env->vfp.vl;
> +    lmul  = vector_get_lmul(env);
> +    width   = vector_get_width(env);
> +    vlmax = vector_get_vlmax(env);
> +
> +    if (vector_vtype_ill(env) || vector_overlap_vm_common(lmul, vm, rd)) {
> +        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
> +        return;
> +    }
> +    vector_lmul_check_reg(env, lmul, rs1, false);
> +    vector_lmul_check_reg(env, lmul, rs2, false);
> +    vector_lmul_check_reg(env, lmul, rd, false);
> +
> +    for (i = 0; i < vlmax; i++) {
> +        src1 = rs1 + (i / (VLEN / width));
> +        src2 = rs2 + (i / (VLEN / width));
> +        dest = rd + (i / (VLEN / width));
> +        j = i % (VLEN / width);
> +        if (i < env->vfp.vstart) {
> +            continue;
> +        } else if (i < vl) {
> +            switch (width) {
> +            case 8:
> +                if (vector_elem_mask(env, vm, width, lmul, i)) {
> +                    env->vfp.vreg[dest].u8[j] = env->vfp.vreg[src1].u8[j]
> +                        & env->vfp.vreg[src2].u8[j];
> +                }
> +                break;

Note that a non-predicated logical operation need not consider the width.  All
of the widths perform the same operation, and therefore having the host operate
on u64 is fastest.  This is another good reason to notice vm=1 within the
translator and use separate helper functions for masked vs non-masked.

> +void VECTOR_HELPER(vand_vx)(CPURISCVState *env, uint32_t vm, uint32_t rs1,
> +    uint32_t rs2, uint32_t rd)
...
> +void VECTOR_HELPER(vand_vi)(CPURISCVState *env, uint32_t vm, uint32_t rs1,
> +    uint32_t rs2, uint32_t rd)

As with the previous set of arithmetic instructions, these should be a single
helper that is passed a 64-bit scalar.

Note that scalars smaller than 64-bit can be replicated with dup_const().  At
which point the logical operation is easily performed in 64-bit units instead
of any smaller unit.

Note that predication can be handled via logical masking.  For ARM SVE, we have
a set of functions that map the active bits of a predicate mask to byte masks.
 See e.g.

static inline uint64_t expand_pred_b(uint8_t byte)
static inline uint64_t expand_pred_h(uint8_t byte)
static inline uint64_t expand_pred_s(uint8_t byte)

so that the predicated logical and operation looks like

    mask = expand_pred_n(env->vfp.vreg[0].u8[i]);
    result = in1 & in2;
    dest = (result & mask) | (dest & ~mask);


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]