[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access()
From: |
David Hildenbrand |
Subject: |
Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access() |
Date: |
Thu, 22 Aug 2019 09:01:26 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 |
On 22.08.19 00:31, Richard Henderson wrote:
> On 8/21/19 2:33 PM, David Hildenbrand wrote:
>>> NOTDIRTY cannot fault at all. The associated rcu critical section is ugly
>>> enough to make me not want to do anything except continue to go through the
>>> regular MMIO path.
>>>
>>> In any case, so long as we eliminate *access* faults by probing the page
>>> table,
>>> then falling back to the byte-by-byte loop is, AFAICS, sufficient to
>>> implement
>>> the instructions correctly.
>>
>> "In any case, so long as we eliminate *access* faults by probing the
>> page table" - that's what I'm doing in this patch (and even more correct
>> in the prototype patch I shared), no? (besides the watchpoint madness below)
>
> Correct.
>
> My main objection to your current patch is that you perform the access checks
> within MVC, and then do some more tlb lookups in fast_memmove.
Right, however I think splitting up both steps is nicer (especially if
we mix and match memmove and memset in MVCLE later). Maybe combining a
tuned-up probe_write() with something similar (but different to)
access_prepare().
General changes:
1. Check watchpoints in probe_write()
2. Make probe_write() bail out if it would cross pages() - caller has to
assure it falls into a single page (TARGET_PAGE_SIZE ?).
3. Return a pointer from probe_write()
-> If NULL is returned, reads/writes have to go via memops, e.g., single
byte access
4. Make probe_write() return addresses of TLB entries that are already
invalid again (-> LAP)
What I propose for s390x:
1. access_prepare(): Collect the pointers using probe_write() - maximum
is two pages. If we get NULL, there is no fault but we have to fallback
to read/write using memops.
2. access_memset() / access_set() / access_move() ... pass the access
information we collected. Prototype I shared can be heavily simplified -
don't fall back to paddrs but instead do single-byte access, don't allow
flexible array of pages, etc.
3. First convert MVC to the new access mechanism, later fix + convert
the others.
>
> I think that fast_memmove is where the access checks should live. That allows
> incremental improvement to combine access checks + host address lookup, which
> cannot currently be done in one step with existing interfaces.
>
> I guess you would still want access checks within MVC for the case in which
> you
> must fall back to byte-by-byte because of destructive overlap.
That's why I think introducing a new set of helpers makes sense. Once
all users of fast_memmove() etc are gone, we can then throw them away.
>
>> "falling back to the byte-by-byte loop is, AFAICS, sufficient"
>>
>> I don't think this is sufficient. E.g., LAP protected pages
>> (PAGE_WRITE_INV which immediately requires a new MMU walk on the next
>> access) will trigger a new MMU walk on every byte access (that's why I
>> chose to pre-translate in my prototype).
>
> LAP protected pages is exactly why probe_write should return the host address,
> so that we can do the access check + host address lookup in one step.
>
Yes, this should also be done in tlb_vaddr_to_host() - but as we'll be
converting to a probe_write() thingy, it might not be required to handle
the special "TLB_INVALID_MASK set after tlb_fill()" case.
> But in the meantime...
>
>> If another CPU modified the
>> page tables in between, we could suddenly get a fault - although we
>> checked early. What am I missing?
>
> You're concerned with a bare write to the page table by cpu B, while cpu A is
> executing, and before cpu B issues the cross-cpu tlb flush?
Not bare writes, these are invalid. Take IPTE for example. Set the
invalid bit, then do a sync CPU flush.
If we re-translate a vaddr after probe_write() - which would currently
happen in case of LAP - then we could suddenly run into the invalid bit,
triggering a fault. The sync CPU flush still wasn't processed, but we
see the "invalid" bit early because we decided to re-translate after
already modifying memory. This re-translation really has to be avoided.
>
> The tlb victim cache should prevent having to re-read a tlb entry from memory,
> at least for MVC. The unlimited size we currently support for MVCL and MVCLE
> could act weird, but would be fixed by limiting the execution as discussed.
Yes, MVCL/MVCLE need a major overhaul. But good to know that a second
tlb_fill() will not result in the other TLB entry to suddenly get
invalid and require a new MMU translation (-> victim cache). That's
another concern I had.
>
> Honestly, the os has to make sure that the page remains valid until after the
> flush completes, otherwise it's an os bug. The cross-cpu tlb flush happens
> via
> async_run_on_cpu, and of course never occurs while we are executing a TB. Yet
> another reason to limit the amount of work any one instruction does. ;-)
In my example, the page would still be valid, however we the invalid bit
has already been set. The next MMU translation would choke on that and
trigger a fault.
I can only see this (re-translation) happening for LAP right now. And
LAP pages (lowcore) should be always mapped, so I don't think this is a
very important case to handle.
>
>
>> I see that we use BP_STOP_BEFORE_ACCESS for PER (Program Event
>> Recording) on s390x. I don't think that's correct. We want to get
>> notified after the values were changed.
>>
>> "A storage-alteration event occurs whenever a CPU,
>> by using a logical or virtual address, makes a store
>> access without an access exception to the storage
>> area designated by control registers 10 and 11. ..."
>>
>> "For a PER instruction-fetching nullification event, the
>> unit of operation is nullified. For other PER events,
>> the unit of operation is completed"
>>
>> Oh man, why is everything I take a look at broken.
>
> Heh.
>
>>> In the latter case, if the instruction has had any side effects prior to the
>>> longjmp, they will be re-done when we re-start the current instruction.
>>>
>>> To me this seems like a rather large bug in our implementation of
>>> watchpoints,
>>> as it only really works properly for simple load/store/load-op-store type
>>> instructions. Anything that works on many addresses and doesn't delay side
>>> effects until all accesses are complete will Do The Wrong Thing.
>>>
>>> The fix, AFAICS, is for probe_write to call check_watchpoint(), so that we
>>> take the debug exit early.
>>
>> Indeed. I see what you mean now. (I was ignoring the "before access"
>> because I was assuming we don't need it on s390x)
>>
>> probe_write() would have to check for all BP_STOP_BEFORE_ACCESS watchpoints.
>
> !BP_STOP_BEFORE_ACCESS watchpoints exit to the main loop as well, so that it
> can restart and then single-step the current instruction.
>
> We need it the check in probe_write for all cases.
Thanks for the clarification!
>
>> Yes, that's what I mean, TARGET_PAGE_SIZE, but eventually crossing a
>> page boundary. The longer I stare at the MVCL code, the more broken it
>> is. There are more nice things buried in the PoP. MVCL does not detect
>> access exceptions beyond the next 2k. So we have to limit it there
>> differently.
>
> That language is indeed odd.
>
> The only reading of that paragraph that makes sense to me is that the hardware
> *must* interrupt MVCL after every 2k bytes processed. The idea that the user
> can magically write to a read-only page simply by providing length = 2MB and
> page that is initially writable is dumb. I cannot imagine that is a correct
> reading.
>
> Getting clarification from an IBM engineer on that would be good; otherwise I
> would just ignore that and proceed as if all access checks are performed.
>
>> So what I understand is that
>>
>> - we should handle watchpoints in probe_write()
>> - not bypass IO memory (especially NOTDIRTY). We cannot always relay on
>> getting access to a host page.
>
> Correct.
>
>
> r~
>
--
Thanks,
David / dhildenb
- [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), (continued)
- [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/22
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(),
David Hildenbrand <=
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/26
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
[Qemu-devel] [PATCH v1 4/4] s390x/tcg: MOVE (MVC): Fault-safe handling, David Hildenbrand, 2019/08/21