[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [qemu-s390x] [PATCH v1] s390x/kvm: Configure page size after memory
From: |
David Gibson |
Subject: |
Re: [qemu-s390x] [PATCH v1] s390x/kvm: Configure page size after memory has actually been initialized |
Date: |
Fri, 29 Mar 2019 10:46:14 +1100 |
User-agent: |
Mutt/1.11.3 (2019-02-01) |
On Thu, Mar 28, 2019 at 12:39:24PM +0100, Igor Mammedov wrote:
> On Thu, 28 Mar 2019 11:18:02 +0100
> David Hildenbrand <address@hidden> wrote:
>
> > On 28.03.19 01:24, David Gibson wrote:
> > > On Wed, Mar 27, 2019 at 09:06:53PM +0100, David Hildenbrand wrote:
> > >> On 27.03.19 17:45, Igor Mammedov wrote:
> > >>> On Wed, 27 Mar 2019 14:59:44 +0100
> > >>> David Hildenbrand <address@hidden> wrote:
> > >>>
> > >>>> Right now we configure the pagesize quite early, when initializing KVM.
> > >>>> This is long before system memory is actually allocated via
> > >>>> memory_region_allocate_system_memory(), and therefore memory backends
> > >>>> marked as mapped.
> > >>>>
> > >>>> Instead, let's configure the maximum page size after initializing
> > >>>> memory in s390_memory_init(). cap_hpage_1m is still properly
> > >>>> configured before creating any CPUs, and therefore before configuring
> > >>>> the CPU model and eventually enabling CMMA.
> > >>>>
> > >>>> We might later want to replace qemu_getrampagesize() by another
> > >>>> detection mechanism, e.g. only looking at mapped, initial memory.
> > >>>> We don't support any memory devices yet, and if so, we can always
> > >>>> reject
> > >>>> devices with a page size bigger than the initial page size when
> > >>>> hotplugging. qemu_getrampagesize() should work for now, especially when
> > >>>> converting it to only look at mapped backends.
> > >>>>
> > >>>> Signed-off-by: David Hildenbrand <address@hidden>
> > >>>
> > >>> Acked-by: Igor Mammedov <address@hidden>
> > >>
> > >> BTW, do we want
> > >>
> > >> qemu_getmaxrampagesize()
> > >> qemu_getminrampagesize()
> > >
> > > That could work.
> > >
> > >> or similar. qemu_getrampagesize() in its current form is really far from
> > >> beautiful.
> > >
> > > Yeah, and AFAICT the way it's used here remains incorrect.
> > >
> >
> > Soooooo,
> >
> > this is all a big mess :)
> >
> >
> > 1. We have to decide on the page size before initializing the CPU model
> > (due to CMMA) and before creating any CPUs -> We have to do it
> > early/before machine init.
> >
> > 2. Memory devices (if ever supported) are created + realized (+ backends
> > mapped) after machine init.
> >
> > 3. Memory backends are created delayed, so even after creating devices.
> >
> > All we can do for s390x is
> >
> > a) Use the page size of initial memory when configuring the page size in
> > KVM. (AFAICS what would be done right now)
> >
> > b) When cold/hotplugging memory devices, reject devices with a page size
> > bigger than the page size of initial memory. Not an issue now. In the
> > future it might be "unfortunate" but at least we can print a nice error
> > from the machine hotplug handler "page size not supported in this
> > configuration".
> >
> > I double checked, "-numa" is not supported in s390x. So "-numa
> > node,mempath=..." does not apply. However this might change and we can
> > easily forget about this. Also, getting rid of -mem-path might be one
> > issue. So the right think to do would be
> >
> > a) this patch
> > b) introduce and use qemu_getmaxrampagesize()
> >
> > Then, we would properly detect the maximum page size *for initial
> > memory*. Memory devices, we will have to worry about in the future, but
> > should be easy to handle (remember and check against maximum configured
> > page size).
>
> Problem David tries to solve is that
Uh, which David?
> 1: user hotplugged backend with wrong page size
> 2: hotplug of associated device 'pc-dimm' cleanly fails with error
> 3: guest does reboot and QEMU crashes with fatal error
>
> Issue is that qemu_getmaxrampagesize() iterates over backends which
> might be used for guest RAM or theoretically might be used for
> something else.
>
> using 'mapped' attribute is a hack that currently works around issue [3]
> but it's prone to being incorrect depending on call order.
I'm not sure what you mean by that. For the purposes we need on ppc
checking the RAM that's actually present in the guest really is what
we need.
> maybe we should prevent [1] in the first place (but making backends
> depend on machine/accel doesn't look good either)?
Yeah, I wondered about this. There's no way to do it right now, and a
hook would be kinda ugly. Not sure if it's more or less ugly than the
alternatives.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature