Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext

qemu-arm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff

From:	Richard Henderson
Subject:	Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff
Date:	Tue, 16 Jul 2024 07:42:51 +1000
User-agent:	Mozilla Thunderbird

On 7/15/24 17:06, Max Chou wrote:

+                /* Probe nonfault on subsequent elements. */
+                flags = probe_access_flags(env, addr, offset, MMU_DATA_LOAD,
+                                           mmu_index, true, &host, 0);
+                if (flags) {
According to the section 7.7. Unit-stride Fault-Only-First Loads in the v spec 
(v1.0)
When the fault-only- data-watchpoint trap on an element after theimplementations should not reduce vl but instead should trigger the debug trap asotherwise the event might be lost.


Hmm, ok.  Interesting.

And I think that there is a potential issue in the original implementation that maybe wecan fix in this patch.

We need to assign the correct element load size to the probe_access_internal functioncalled by tlb_vaddr_to_host in original implementation or is called directly in this patch.The size parameter will be used by the pmp_hart_has_privs function to do the physicalmemory protection (PMP) checking.If we set the size parameter to the remain page size, we may get unexpected trap caused bythe PMP rules that covered the regions of masked-off elements.


Maybe we can replace the while loop liked below.


vext_ldff(void *vd, void *v0, target_ulong base,
           ...
{
     ...
     uint32_t size = nf << log2_esz;

     VSTART_CHECK_EARLY_EXIT(env);

     /* probe every access */
     for (i = env->vstart; i < env->vl; i++) {
         if (!vm && !vext_elem_mask(v0, i)) {
             continue;
         }
         addr = adjust_addr(env, base + i * size);
         if (i == 0) {
             probe_pages(env, addr, size, ra, MMU_DATA_LOAD);
         } else {
             /* if it triggers an exception, no need to check watchpoint */
             void *host;
             int flags;

             /* Probe nonfault on subsequent elements. */
             flags = probe_access_flags(env, addr, size, MMU_DATA_LOAD,
                     mmu_index, true, &host, 0);
             if (flags & ~TLB_WATCHPOINT) {
                 /*
                  * Stop any flag bit set:
                  *   invalid (unmapped)
                  *   mmio (transaction failed)
                  * In all cases, handle as the first load next time.
                  */
                 vl = i;
                 break;
             }
         }
     }


No, I don't think repeated probing is a good idea.
You'll lose everything you attempted to gain with the other improvements.

It seems, to handle watchpoints, you need to start by probing the entire length non-fault.That will tell you if any portion of the length has any of the problem cases. The fastpath will not, of course.

After probing, you have flags for the 1 or two pages, and you can make a choice about theactual load length:


  - invalid on first page: either the first element faults,
    or you need to check PMP via some alternate mechanism.
    Do not be afraid to add something to CPUTLBEntryFull.extra.riscv
    during tlb_fill in order to accelerate this, if needed.

  - mmio on first page: just one element, as the second might fault
    during the transaction.

    It would be possible to enhance riscv_cpu_do_transaction_failed to
    suppress the fault and set a flag noting the fault.  This would allow
    multiple elements to be loaded, at the expense of another check after
    each element within the slow tlb-load path.  I don't know if this is
    desirable, really.  Using vector operations on mmio is usually a
    programming error.  :-)

  - invalid or mmio on second page, continue to the end of the first page.

Once we have the actual load length, handle watchpoints by hand.
See sve_cont_ldst_watchpoints.

Finally, the loop loading the elements, likely in ram via host pointer.


r~

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v2 08/13] target/ppc: Merge helper_{dcbz,dcbzep}, (continued)
- [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only, Richard Henderson, 2024/07/09
  - Re: [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only, BALATON Zoltan, 2024/07/10
    - Re: [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only, Richard Henderson, 2024/07/10
- [PATCH v2 10/13] target/s390x: Use user_or_likely in do_access_memset, Richard Henderson, 2024/07/09
  - Re: [PATCH v2 10/13] target/s390x: Use user_or_likely in do_access_memset, Peter Maydell, 2024/07/12
- [PATCH v2 11/13] target/s390x: Use user_or_likely in access_memmove, Richard Henderson, 2024/07/09
- [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff, Richard Henderson, 2024/07/09
  - Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff, Alistair Francis, 2024/07/10
  - Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff, Max Chou, 2024/07/15
    - Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff, Richard Henderson <=
- [PATCH v2 12/13] target/s390x: Use set/clear_helper_retaddr in mem_helper.c, Richard Henderson, 2024/07/09

Prev by Date: [PULL 6/6] docs: remove Sphinx 1.x compatibility code
Next by Date: [PATCH v2 0/5] target/arm/kvm: Report PMU unavailability
Previous by thread: Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff
Next by thread: [PATCH v2 12/13] target/s390x: Use set/clear_helper_retaddr in mem_helper.c
Index(es):
- Date
- Thread