[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling
From: |
David Gibson |
Subject: |
Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling machine check |
Date: |
Thu, 28 Mar 2019 20:56:26 +1100 |
User-agent: |
Mutt/1.11.3 (2019-02-01) |
On Thu, Mar 28, 2019 at 12:09:05PM +0530, Aravinda Prasad wrote:
>
>
> On Thursday 28 March 2019 06:18 AM, David Gibson wrote:
> > On Tue, Mar 26, 2019 at 12:22:50PM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Tuesday 26 March 2019 09:49 AM, David Gibson wrote:
> >>> On Mon, Mar 25, 2019 at 02:31:10PM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Monday 25 March 2019 12:03 PM, David Gibson wrote:
> >>>>> On Fri, Mar 22, 2019 at 12:04:25PM +0530, Aravinda Prasad wrote:
> >>>>>> Block VM migration requests until the machine check
> >>>>>> error handling is complete as (i) these errors are
> >>>>>> specific to the source hardware and is irrelevant on
> >>>>>> the target hardware, (ii) these errors cause data
> >>>>>> corruption and should be handled before migration.
> >>>>>>
> >>>>>> Signed-off-by: Aravinda Prasad <address@hidden>
> >>>>>> ---
> >>>>>> hw/ppc/spapr_events.c | 17 +++++++++++++++++
> >>>>>> hw/ppc/spapr_rtas.c | 4 ++++
> >>>>>> include/hw/ppc/spapr.h | 3 +++
> >>>>>> 3 files changed, 24 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >>>>>> index d7cc0a4..6356a38 100644
> >>>>>> --- a/hw/ppc/spapr_events.c
> >>>>>> +++ b/hw/ppc/spapr_events.c
> >>>>>> @@ -41,6 +41,7 @@
> >>>>>> #include "qemu/bcd.h"
> >>>>>> #include "hw/ppc/spapr_ovec.h"
> >>>>>> #include <libfdt.h>
> >>>>>> +#include "migration/blocker.h"
> >>>>>>
> >>>>>> #define RTAS_LOG_VERSION_MASK 0xff000000
> >>>>>> #define RTAS_LOG_VERSION_6 0x06000000
> >>>>>> @@ -866,6 +867,22 @@ static void spapr_mce_build_elog(PowerPCCPU *cpu,
> >>>>>> bool recovered)
> >>>>>> void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>>>>> {
> >>>>>> SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >>>>>> + int ret;
> >>>>>> + Error *local_err = NULL;
> >>>>>> +
> >>>>>> + error_setg(&spapr->migration_blocker,
> >>>>>> + "Live migration not supported during machine check
> >>>>>> handling");
> >>>>>> + ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
> >>>>>> + if (ret < 0) {
> >>>>>> + /*
> >>>>>> + * We don't want to abort and let the migration to continue.
> >>>>>> In a
> >>>>>> + * rare case, the machine check handler will run on the target
> >>>>>> + * hardware. Though this is not preferable, it is better than
> >>>>>> aborting
> >>>>>> + * the migration or killing the VM.
> >>>>>
> >>>>> Can't you just discard the error in that case?
> >>>>
> >>>> I am actually discarding the error. Do you mean I don't have to print
> >>>> the below warning or am I missing something?
> >>>
> >>> Um.. I guess we should keep the warning. The tricky thing here is
> >>> that the information on the machine check is no longer relevant after
> >>> the migration, however presumably the event which triggered it might
> >>> have already corrupted guest memory, which *would* be relevant to the
> >>> guest after migration. Not sure what to do about that.
> >>
> >> I think the information is still relevant after the migration, because
> >> it includes the effective address in error and the context. If the
> >> corruption is in the guest kernel context then the machine check handler
> >> inside the guest kernel will call panic(). If the corruption is in user
> >> space, then the application is sent a SIGBUS.
> >
> > Good point.
> >
> >> But the problem is when the machine check handler hardware poisons the
> >> page frame backing the corrupt memory to avoid reuse of that page frame.
> >> After migration, the page frame backing the corrupt memory is clean, but
> >> we poison the clean page frame.
> >
> > This is the guest side handler? It sounds like we we need two
> > variants of the notification that say whether or not the area is still
> > bad.
>
> Yes this is guest side handler.
>
> Can we just let the guest machine check handler poison a clean page as a
> migration during a machine check is a rare occurrence. We may at most
> lose a clean page which is a trade-off for additional complexity in the
> machine check handler.
Yeah, that seems reasonable.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
- Re: [Qemu-ppc] [PATCH v7 4/6] target/ppc: Build rtas error log, (continued)
[Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling machine check, Aravinda Prasad, 2019/03/22
- Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling machine check, David Gibson, 2019/03/25
- Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling machine check, Aravinda Prasad, 2019/03/25
- Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling machine check, David Gibson, 2019/03/26
- Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling machine check, Aravinda Prasad, 2019/03/26
- Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling machine check, David Gibson, 2019/03/27
- Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling machine check, Aravinda Prasad, 2019/03/28
- Re: [Qemu-ppc] [PATCH v7 6/6] migration: Block migration while handling machine check,
David Gibson <=
[Qemu-ppc] [PATCH v7 5/6] ppc: spapr: Enable FWNMI capability, Aravinda Prasad, 2019/03/22