qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/4] target/arm: Add support for DC CVAP & DC CV


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH 4/4] target/arm: Add support for DC CVAP & DC CVADP ins
Date: Tue, 24 Sep 2019 10:22:01 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 9/10/19 2:56 AM, Beata Michalska wrote:
> @@ -2229,7 +2229,8 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid)
>  #define ARM_CP_NZCV              (ARM_CP_SPECIAL | 0x0300)
>  #define ARM_CP_CURRENTEL         (ARM_CP_SPECIAL | 0x0400)
>  #define ARM_CP_DC_ZVA            (ARM_CP_SPECIAL | 0x0500)
> -#define ARM_LAST_SPECIAL         ARM_CP_DC_ZVA
> +#define ARM_CP_DC_CVAP           (ARM_CP_SPECIAL | 0x0600)
> +#define ARM_LAST_SPECIAL         ARM_CP_DC_CVAP

I don't see that this operation needs to be handled via "special".  It's a
function call upon write, as for many other system registers.

> +static inline bool isar_feature_aa64_dcpop(const ARMISARegisters *id)
> +{
> +    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) != 0;
> +}
> +
> +static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
> +{
> +    return (FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >> 1) != 0;
> +}

The correct check is FIELD_EX(...) >= 2.  This is a 4-bit field, as with all of
the others.

> +static CPAccessResult aa64_cacheop_persist_access(CPUARMState *env,
> +                                                  const ARMCPRegInfo *ri,
> +                                                  bool isread)
> +{
> +    ARMCPU *cpu = env_archcpu(env);
> +    /*
> +     * Access is UNDEF if lacking implementation for either DC CVAP or DC 
> CVADP
> +     * DC CVAP ->  CRm: 0xc
> +     * DC CVADP -> CRm: 0xd
> +     */
> +    return (ri->crm == 0xc && !cpu_isar_feature(aa64_dcpop, cpu)) ||
> +           (ri->crm == 0xd && !cpu_isar_feature(aa64_dcpodp, cpu))
> +           ? CP_ACCESS_TRAP_UNCATEGORIZED
> +           : aa64_cacheop_access(env, ri, isread);
> +}
...
> +    { .name = "DC_CVAP", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 1,
> +      .access = PL0_W, .type = ARM_CP_DC_CVAP,
> +      .accessfn = aa64_cacheop_persist_access },
> +    { .name = "DC_CVADP", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 1,
> +      .access = PL0_W, .type = ARM_CP_DC_CVAP,
> +      .accessfn = aa64_cacheop_persist_access },

While this access function works, it's better to simply not register these at
all when they're not supported.  Compare the registration of rndr_reginfo.

As I described above, I think this can use a normal write function.  In which
case this would use .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END.

(I believe that ARM_CP_IO is not required, but I'm not 100% sure.  Certainly
there does not seem to be anything in dc_cvap() that affects state which can
queried from another virtual cpu, so there does not appear to be any reason to
grab the global i/o lock.  The host kernel should be just fine with concurrent
msync syscalls, or whatever it is that libpmem uses.)


> +void HELPER(dc_cvap)(CPUARMState *env, uint64_t vaddr_in)
> +{
> +#ifndef CONFIG_USER_ONLY
> +    ARMCPU *cpu = env_archcpu(env);
> +    /* CTR_EL0 System register -> DminLine, bits [19:16] */
> +    uint64_t dline_size = 4 << ((cpu->ctr >> 16) & 0xF);
> +    uint64_t vaddr = vaddr_in & ~(dline_size - 1);
> +    void *haddr;
> +    int mem_idx = cpu_mmu_index(env, false);
> +
> +    /* This won't be crossing page boundaries */
> +    haddr = probe_read(env, vaddr, dline_size, mem_idx, GETPC());
> +    if (haddr) {
> +
> +        ram_addr_t offset;
> +        MemoryRegion *mr;
> +
> +        /*
> +         * RCU critical section + ref counting,
> +         * so that MR won't disappear behind the scene
> +         */
> +        rcu_read_lock();
> +        mr = memory_region_from_host(haddr, &offset);
> +        if (mr) {
> +            memory_region_ref(mr);
> +        }
> +        rcu_read_unlock();
> +
> +        if (mr) {
> +            memory_region_do_writeback(mr, offset, dline_size);
> +            memory_region_unref(mr);
> +        }
> +    }
> +#endif


We hold the rcu lock whenever a TB is executing.  I don't believe there's any
point in increasing the lock count here.  Similarly with memory_region
refcounts -- they cannot vanish while we're executing a TB.

Thus I believe that about half of this function can fold away.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]