[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access()
From: |
David Hildenbrand |
Subject: |
Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access() |
Date: |
Wed, 21 Aug 2019 21:36:20 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 |
On 21.08.19 21:19, Richard Henderson wrote:
> On 8/21/19 10:37 AM, David Hildenbrand wrote:
>> Hah, guess what, I implemented a similar variant of "fetch all
>> of the host addresses" *but* it is not that easy as you might
>> think (sorry for the bad news).
>
> I think it is, because I didn't think it *that* easy. :-)
:) hehe
>
>> There are certain cases where we can't get access to the raw host
>> page. Namely, cpu watchpoints, LAP, NODIRTY. In summary: this won't
>> as you describe. (my first approach did exactly this)
>
> NODIRTY and LAP are automatically handled via probe_write
> faulting instead of returning the address. I think there
> may be a bug in probe_write at present not checking
For LAP pages we immediately set TLB_INVALID_MASK again, to trigger a
new fault on the next write access (only). The could be handled in
tlb_vaddr_to_host(), simply returning the address to the page after
trying to fill the tlb and succeeding (I implemented that, that's the
easy part).
TLB_NOTDIRTY and TLB_MMIO are the real issue. We don't want to refault,
we want to treat that memory like IO memory and route it via
MemoryRegionOps() - e.g., watch_mem_ops() in qemu/exec.c. So it's not a
fault but IO memory.
That's why we don't expose that memory via tlb_vaddr_to_host(). Faulting
in case of TLB_NOTDIRTY or TLB_MMIO would be bad.
>
> Watchpoints could be handled the same way, if we were to
> export check_watchpoint from exec.c. Indeed, I see no way
> to handle watchpoints correctly if we don't. I think that's
> an outstanding bug with probe_write.
probe_write() performs the MMU translation. If that succeeds, there is
no fault. If there are watchpoints, the memory is treated like IO and
memory access is rerouted. I think this works as designed.
>
> Any other objections? I certainly think that restricting the
> size of such operations to one page is a large simplification
> over the S390Access array thing that you create in this patch.
You mean two pages I assume. Yeah, I could certainly simplify the
prototype patch I have here quite a lot. 2 pages should be enough for
everybody ;)
The basic question is: Should we try to somehow work around IO memory
access (including NOTDIRTY and watchpoints) in tlb_vaddr_to_host() or
perform access in these cases via cpu_physical_memory_write() ?
It feels somewhat wrong to me to tune tlb_vaddr_to_host() to always
return the address of a page although we are dealing with
MemoryRegionOps() that want a more controlled access mechanism.
>
>
> r~
>
>>
>> The following patch requires another re-factoring
>> (tcg_s390_cpu_mmu_translate), but you should get the idea.
>>
>>
>>
>> From 0cacd2aea3dbc25e93492cca04f6c866b86d7f8a Mon Sep 17 00:00:00 2001
>> From: David Hildenbrand <address@hidden>
>> Date: Tue, 20 Aug 2019 09:37:11 +0200
>> Subject: [PATCH v1] s390x/tcg: Fault-safe MVC (MOVE) implementation
>>
>> MVC can cross page boundaries. In case we fault on the second page, we
>> already partially copied data. If we have overlaps, we would
>> trigger a fault after having partially moved data, eventually having
>> our original data already overwritten. When continuing after the fault,
>> we would try to move already modified data, not the original data -
>> very bad.
>>
>> glibc started to use MVC for forward memmove() and is able to trigger
>> exectly this corruption (via rpmbuild and rpm). Fedora 31 (rawhide)
>> currently fails to install as we trigger rpm database corruptions due to
>> this bug.
>>
>> We need a way to translate a virtual address range to individual pages that
>> we can access later on without triggering faults. Probing all virtual
>> addresses once before the read/write is not sufficient - the guest could
>> have modified the page tables (e.g., write-protect, map out) in between,
>> so on we could fault on any new tlb_fill() - we have to skip any new MMU
>> translations.
>>
>> Unfortunately, there are TLB entries for which cannot get a host address
>> for (esp., watchpoints, LAP, NOTDIRTY) - in these cases we cannot avoid
>> a new MMU translation using the ordinary ld/st helpers. Let's fallback
>> to guest physical addresses in these cases, that we access via
>> cpu_physical_memory_(read|write),
>>
>> This change reduced the boottime for s390x guests (to prompt) from ~1:29
>> min to ~1:19 min in my tests. For example, LAP protected pages are now only
>> translated once when writing to them using MVC and we don't always fallback
>> to byte-based copies.
>>
>> We will want to use the same mechanism for other accesses as well (e.g.,
>> mvcl), prepare for that right away.
>>
>> Signed-off-by: David Hildenbrand <address@hidden>
>> ---
>> target/s390x/mem_helper.c | 213 +++++++++++++++++++++++++++++++++++---
>> 1 file changed, 200 insertions(+), 13 deletions(-)
>>
>> diff --git a/target/s390x/mem_helper.c b/target/s390x/mem_helper.c
>> index 91ba2e03d9..1ca293e00d 100644
>> --- a/target/s390x/mem_helper.c
>> +++ b/target/s390x/mem_helper.c
>> @@ -24,8 +24,10 @@
>> #include "exec/helper-proto.h"
>> #include "exec/exec-all.h"
>> #include "exec/cpu_ldst.h"
>> +#include "exec/cpu-common.h"
>> #include "qemu/int128.h"
>> #include "qemu/atomic128.h"
>> +#include "tcg_s390x.h"
>>
>> #if !defined(CONFIG_USER_ONLY)
>> #include "hw/s390x/storage-keys.h"
>> @@ -104,6 +106,181 @@ static inline void cpu_stsize_data_ra(CPUS390XState
>> *env, uint64_t addr,
>> }
>> }
>>
>> +/*
>> + * An access covers one page, except for the start/end of the translated
>> + * virtual address range.
>> + */
>> +typedef struct S390Access {
>> + union {
>> + char *haddr;
>> + hwaddr paddr;
>> + };
>> + uint16_t size;
>> + bool isHaddr;
>> +} S390Access;
>> +
>> +/*
>> + * Prepare access to a virtual address range, guaranteeing we won't trigger
>> + * faults during the actual access. Sometimes we can't get access to the
>> + * host address (e.g., LAP, cpu watchpoints/PER, clean pages, ...). Then, we
>> + * translate to guest physical addresses instead. We'll have to perform
>> + * slower, indirect, access to these physical addresses then.
>> + */
>> +static void access_prepare_idx(CPUS390XState *env, S390Access access[],
>> + int nb_access, vaddr vaddr, target_ulong
>> size,
>> + MMUAccessType access_type, int mmu_idx,
>> + uintptr_t ra)
>> +{
>> + int i = 0;
>> + int cur_size;
>> +
>> + /*
>> + * After we obtained the host address of a TLB entry that entry might
>> + * be invalidated again - e.g., via tlb_set_dirty(), via another
>> + * tlb_fill(). We assume here that it is fine to temporarily store the
>> + * host address to access it later - we didn't agree to any tlb flush
>> and
>> + * there seems to be no mechanism protecting the return value of
>> + * tlb_vaddr_to_host().
>> + */
>> + while (size) {
>> + g_assert(i < nb_access);
>> + cur_size = adj_len_to_page(size, vaddr);
>> +
>> + access[i].isHaddr = true;
>> + access[i].haddr = tlb_vaddr_to_host(env, vaddr, access_type,
>> mmu_idx);
>> + if (!access[i].haddr) {
>> + access[i].isHaddr = false;
>> + tcg_s390_cpu_mmu_translate(env, vaddr, cur_size, access_type,
>> + mmu_idx, false, &access[i].paddr,
>> + NULL, ra);
>> + }
>> + access[i].size = cur_size;
>> +
>> + vaddr += cur_size;
>> + size -= cur_size;
>> + i++;
>> + }
>> +
>> + /* let's zero-out the remaining entries, so we have a size of 0 */
>> + if (i < nb_access) {
>> + memset(&access[i], 0 , sizeof(S390Access) * (nb_access - i));
>> + }
>> +}
>> +
>> +static void access_prepare(CPUS390XState *env, S390Access access[],
>> + int nb_access, target_ulong vaddr, target_ulong
>> size,
>> + MMUAccessType access_type, uintptr_t ra)
>> +{
>> + int mmu_idx = cpu_mmu_index(env, false);
>> +
>> + access_prepare_idx(env, access, nb_access, vaddr, size, access_type,
>> + mmu_idx, ra);
>> +}
>> +
>> +static void access_set(CPUS390XState *env, S390Access write[], int nb_write,
>> + uint8_t byte, target_ulong size)
>> +{
>> + target_ulong cur_size;
>> + void *buf = NULL;
>> + int w = 0;
>> +
>> + while (size) {
>> + g_assert(w < nb_write);
>> + if (!write[w].size) {
>> + w++;
>> + continue;
>> + }
>> +
>> + cur_size = MIN(size, write[w].size);
>> + if (write[w].isHaddr) {
>> + memset(write[w].haddr, byte, cur_size);
>> + write[w].haddr += cur_size;
>> + } else {
>> +#ifndef CONFIG_USER_ONLY
>> + if (!buf) {
>> + buf = g_malloc(TARGET_PAGE_SIZE);
>> + memset(buf, byte, cur_size);
>> + }
>> + cpu_physical_memory_write(write[w].paddr, buf, cur_size);
>> + write[w].paddr += cur_size;
>> +#else
>> + g_assert_not_reached();
>> +#endif
>> + }
>> + write[w].size -= cur_size;
>> + size -= cur_size;
>> + }
>> + g_free(buf);
>> +}
>> +
>> +/*
>> + * Copy memory in chunks up to chunk_size. If the ranges don't overlap or
>> + * if it's a forward move, this function behaves like memmove().
>> + *
>> + * To achieve a backwards byte-by-byte copy (e.g., MVC), the chunk_size
>> + * must not be bigger than the address difference (in the worst case, 1
>> byte).
>> + */
>> +static void access_copy(CPUS390XState *env, S390Access write[], int
>> nb_write,
>> + S390Access read[], int nb_read, target_ulong size,
>> + target_ulong chunk_size)
>> +{
>> + target_ulong cur_size;
>> + void *buf = NULL;
>> + int r = 0;
>> + int w = 0;
>> +
>> + g_assert(chunk_size > 0);
>> + chunk_size = MIN(chunk_size, TARGET_PAGE_SIZE);
>> +
>> + while (size) {
>> + g_assert(w < nb_write);
>> + g_assert(r < nb_read);
>> + if (!write[w].size) {
>> + w++;
>> + continue;
>> + }
>> + if (!read[r].size) {
>> + r++;
>> + continue;
>> + }
>> + cur_size = MIN(MIN(MIN(size, write[w].size), read[r].size),
>> chunk_size);
>> +
>> + if (write[w].isHaddr && read[r].isHaddr) {
>> + memmove(write[w].haddr, read[r].haddr,
>> + cur_size);
>> + write[w].haddr += cur_size;
>> + read[r].haddr += cur_size;
>> +#ifndef CONFIG_USER_ONLY
>> + } else if (!write[w].isHaddr && read[r].isHaddr) {
>> + cpu_physical_memory_write(write[w].paddr,
>> + read[r].haddr, cur_size);
>> + write[w].paddr += cur_size;
>> + read[r].haddr += cur_size;
>> + } else if (write[w].isHaddr && !read[r].isHaddr) {
>> + cpu_physical_memory_read(read[r].paddr,
>> + write[w].haddr, cur_size);
>> + write[w].haddr += cur_size;
>> + read[r].paddr += cur_size;
>> + } else {
>> + if (!buf) {
>> + buf = g_malloc(chunk_size);
>> + }
>> + cpu_physical_memory_read(read[r].paddr, buf, cur_size);
>> + cpu_physical_memory_write(write[w].paddr, buf, cur_size);
>> + write[w].paddr += cur_size;
>> + read[r].paddr += cur_size;
>> +#else
>> + } else {
>> + g_assert_not_reached();
>> +#endif
>> + }
>> + write[w].size -= cur_size;
>> + read[r].size -= cur_size;
>> + size -= cur_size;
>> + }
>> + g_free(buf);
>> +}
>> +
>> static void fast_memset(CPUS390XState *env, uint64_t dest, uint8_t byte,
>> uint32_t l, uintptr_t ra)
>> {
>> @@ -302,24 +479,34 @@ uint32_t HELPER(oc)(CPUS390XState *env, uint32_t l,
>> uint64_t dest,
>> static uint32_t do_helper_mvc(CPUS390XState *env, uint32_t l, uint64_t dest,
>> uint64_t src, uintptr_t ra)
>> {
>> - uint32_t i;
>> + /* 256 bytes cannot cross more than two pages */
>> + S390Access read[2];
>> + S390Access write[2];
>>
>> HELPER_LOG("%s l %d dest %" PRIx64 " src %" PRIx64 "\n",
>> __func__, l, dest, src);
>> + l++;
>>
>> - /* mvc and memmove do not behave the same when areas overlap! */
>> - /* mvc with source pointing to the byte after the destination is the
>> - same as memset with the first source byte */
>> + g_assert(l <= 256);
>> + access_prepare(env, write, ARRAY_SIZE(write), dest, l, MMU_DATA_STORE,
>> ra);
>> +
>> + /*
>> + * The result of MVC is as if moving single bytes from left to right
>> + * (in contrast to memmove()). It can be used like memset().
>> + */
>> if (dest == src + 1) {
>> - fast_memset(env, dest, cpu_ldub_data_ra(env, src, ra), l + 1, ra);
>> - } else if (dest < src || src + l < dest) {
>> - fast_memmove(env, dest, src, l + 1, ra);
>> - } else {
>> - /* slow version with byte accesses which always work */
>> - for (i = 0; i <= l; i++) {
>> - uint8_t x = cpu_ldub_data_ra(env, src + i, ra);
>> - cpu_stb_data_ra(env, dest + i, x, ra);
>> - }
>> + access_set(env, write, ARRAY_SIZE(write),
>> + cpu_ldub_data_ra(env, src, ra), l);
>> + return env->cc_op;
>> + }
>> +
>> + access_prepare(env, read, ARRAY_SIZE(read), src, l, MMU_DATA_LOAD, ra);
>> + if (dest < src || src + l <= dest) {
>> + access_copy(env, write, ARRAY_SIZE(write), read, ARRAY_SIZE(read),
>> l,
>> + TARGET_PAGE_SIZE);
>> + } else if (src < dest) {
>> + access_copy(env, write, ARRAY_SIZE(write), read, ARRAY_SIZE(read),
>> l,
>> + dest - src);
>> }
>>
>> return env->cc_op;
>>
>
--
Thanks,
David / dhildenb
- [Qemu-devel] [PATCH v1 0/4] s390x/tcg: MOVE (MVC): Fault-safe handling, David Hildenbrand, 2019/08/21
- [Qemu-devel] [PATCH v1 1/4] s390x/tcg: Use guest_addr_valid() instead of h2g_valid() in probe_write_access(), David Hildenbrand, 2019/08/21
- [Qemu-devel] [PATCH v1 3/4] s390x/tcg: MOVE (MVC): Increment the length once, David Hildenbrand, 2019/08/21
- [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(),
David Hildenbrand <=
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/22
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/22
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/26
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
[Qemu-devel] [PATCH v1 4/4] s390x/tcg: MOVE (MVC): Fault-safe handling, David Hildenbrand, 2019/08/21