[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL cl
From: |
Christian Borntraeger |
Subject: |
Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear |
Date: |
Thu, 1 Mar 2018 13:35:08 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 |
On 03/01/2018 01:28 PM, Dr. David Alan Gilbert wrote:
> * Christian Borntraeger (address@hidden) wrote:
>>
>>
>> On 03/01/2018 12:45 PM, Dr. David Alan Gilbert wrote:
>>> * Christian Borntraeger (address@hidden) wrote:
>>>>
>>>>
>>>> On 03/01/2018 10:24 AM, Dr. David Alan Gilbert wrote:
>>>>> * Thomas Huth (address@hidden) wrote:
>>>>>> On 28.02.2018 20:53, Christian Borntraeger wrote:
>>>>>>> When a guests reboots with diagnose 308 subcode 3 it requests the memory
>>>>>>> to be cleared. We did not do it so far. This does not only violate the
>>>>>>> architecture, it also misses the chance to free up that memory on
>>>>>>> reboot, which would help on host memory over commitment. By using
>>>>>>> ram_block_discard_range we can cover both cases.
>>>>>>
>>>>>> Sounds like a good idea. I wonder whether that release_all_ram()
>>>>>> function should maybe rather reside in exec.c, so that other machines
>>>>>> that want to clear all RAM at reset time can use it, too?
>>>>>>
>>>>>>> Signed-off-by: Christian Borntraeger <address@hidden>
>>>>>>> ---
>>>>>>> target/s390x/kvm.c | 19 +++++++++++++++++++
>>>>>>> 1 file changed, 19 insertions(+)
>>>>>>>
>>>>>>> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
>>>>>>> index 8f3a422288..2e145ad5c3 100644
>>>>>>> --- a/target/s390x/kvm.c
>>>>>>> +++ b/target/s390x/kvm.c
>>>>>>> @@ -34,6 +34,8 @@
>>>>>>> #include "qapi/error.h"
>>>>>>> #include "qemu/error-report.h"
>>>>>>> #include "qemu/timer.h"
>>>>>>> +#include "qemu/rcu_queue.h"
>>>>>>> +#include "sysemu/cpus.h"
>>>>>>> #include "sysemu/sysemu.h"
>>>>>>> #include "sysemu/hw_accel.h"
>>>>>>> #include "hw/boards.h"
>>>>>>> @@ -41,6 +43,7 @@
>>>>>>> #include "sysemu/device_tree.h"
>>>>>>> #include "exec/gdbstub.h"
>>>>>>> #include "exec/address-spaces.h"
>>>>>>> +#include "exec/ram_addr.h"
>>>>>>> #include "trace.h"
>>>>>>> #include "qapi-event.h"
>>>>>>> #include "hw/s390x/s390-pci-inst.h"
>>>>>>> @@ -1841,6 +1844,14 @@ static int kvm_arch_handle_debug_exit(S390CPU
>>>>>>> *cpu)
>>>>>>> return ret;
>>>>>>> }
>>>>>>>
>>>>>>> +static void release_all_rams(void)
>>>>>>
>>>>>> s/rams/ram/ maybe?
>>>>>>
>>>>>>> +{
>>>>>>> + struct RAMBlock *rb;
>>>>>>> +
>>>>>>> + QLIST_FOREACH_RCU(rb, &ram_list.blocks, next)
>>>>>>> + ram_block_discard_range(rb, 0, rb->used_length);
>>>>>>
>>>>>> From a coding style point of view, I think there should be curly braces
>>>>>> around ram_block_discard_range() ?
>>>>>
>>>>> I think this might break if it happens during a postcopy migrate.
>>>>> The destination CPU is running, so it can do a reboot at just the wrong
>>>>> time; and then the pages (that are protected by userfaultfd) would get
>>>>> deallocated and trigger userfaultfd requests if accessed.
>>>>
>>>> Yes, userfaultd/postcopy is really fragile and relies on things that are
>>>> not
>>>> necessarily true (e.g. virito-balloon can also invalidate pages).
>>>
>>> That's why we use qemu_balloon_inhibit around postcopy to stop
>>> ballooning; I'm not aware of anything else that does the same.
>>
>> we also have at least the pte_unused thing in mm/rmap.c that clearly
>> predates userfaultfd. We might need to look into this as well....
>
> I've not come across that; what does that do?
It can drop a page on page out if the page is no longer of value. It is used by
the CMMA (guest page hinting) code of s390x.
see kernel mm/rmap.c
static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
unsigned long address, void *arg)
{
[...]
} else if (pte_unused(pteval)) {
/*
* The guest indicated that the page content is of no
* interest anymore. Simply discard the pte, vmscan
* will take care of the rest.
*/
dec_mm_counter(mm, mm_counter(page));
/* We have to invalidate as we cleared the pte */
mmu_notifier_invalidate_range(mm, address,
address + PAGE_SIZE);
} else if (IS_ENABLED(CONFIG_MIGRATION) &&
(flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))) {
[...]
>
>>>
>>>> The right thing here would be to actually terminate the postcopy migrate
>>>> but
>>>> return it as "successful" (since we are going to clear that RAM anyway).
>>>> Do
>>>> you see a good way to achieve that?
>>>
>>> There's no current mechanism to do it; I think it would have to involve
>>> some interaction with the source as well though to tell it that you
>>> didn't need that area of RAM anyway.
>>>
>>> However, there are more problems:
>>> a) Even forgetting the userfault problem, this is racy since during
>>> postcopy you're still receiving blocks from the source at the same time;
>>> so some of the area that you've discarded might get overwritten by data
>>> from the source.
>>
>> So how do you handle the case when the target system writes to memory
>> that is still in flight? Can we build on that mechanism?
>
> Once we've entered postcopy, a page is basically in one of two states:
> a) Not yet received - i.e. marked absent with MADV_DONTNEED; if the
> guest tries to write to it then it'll block with userfault and ask the
> source for the page; so the write wont happen until the page arrives.
> b) Received - we've already got the page from the source; the source
> never resends a page (once in postcopy) so now the destination can just
> write to the page.
>
> Once in postcopy, a page is received at most once (i.e. if it's not
> been received during precopy).
>
> I can imagine two ways of curing it:
> a) Simple but slow; just read all the pages before doing the
> discard, this forces it to wait for the pages to be received.
> b) More complex but fast; Add a message on the return path to the
> source telling it that you're going to discard a range; the source then
> marks it's notes as cleared for those pages and then sends some form of
> ack, and at that point you drop it.
this looks like the most promising approach, but some work.
>
> A 3rd; incomplete way; would be just to drop the userfaultfd on the
> destination for the RAMBlocks that are being cleared; but this does
> leave the source state in a bit of a mess.
>
>
>>> b) Your release_all_rams seems to do all RAM Blocks - won't that nuke
>>> any ROMs as well? Or maybe even flash?
>>
>> ROMs loaded with load_elf (like our s390-ccw.img) are reloaded on every
>> reset.
>> See rom_reset in /hw/core/loader.c
>
> Ah, so this is happening after your reset code you've added?
Yes, I am stopping all CPU, clear the memory. And then I call system_reset.
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Christian Borntraeger, 2018/03/01
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Paolo Bonzini, 2018/03/01
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, David Hildenbrand, 2018/03/01
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Dr. David Alan Gilbert, 2018/03/01
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Christian Borntraeger, 2018/03/01
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Dr. David Alan Gilbert, 2018/03/01
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Christian Borntraeger, 2018/03/01
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Dr. David Alan Gilbert, 2018/03/01
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear,
Christian Borntraeger <=
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Christian Borntraeger, 2018/03/01
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Dr. David Alan Gilbert, 2018/03/01
- Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Dr. David Alan Gilbert, 2018/03/01
Re: [qemu-s390x] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear, Cornelia Huck, 2018/03/05