[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [qemu-s390x] [PATCH for-4.0?] exec: Only count mapped memory backend
From: |
David Gibson |
Subject: |
Re: [qemu-s390x] [PATCH for-4.0?] exec: Only count mapped memory backends for qemu_getrampagesize() |
Date: |
Wed, 27 Mar 2019 23:03:58 +1100 |
User-agent: |
Mutt/1.11.3 (2019-02-01) |
On Wed, Mar 27, 2019 at 10:38:38AM +0100, Igor Mammedov wrote:
> On Wed, 27 Mar 2019 20:07:57 +1100
> David Gibson <address@hidden> wrote:
>
> > On Wed, Mar 27, 2019 at 09:57:45AM +0100, Igor Mammedov wrote:
> > > On Wed, 27 Mar 2019 11:11:46 +1100
> > > David Gibson <address@hidden> wrote:
> > >
> > > > On Tue, Mar 26, 2019 at 03:08:20PM +0100, Igor Mammedov wrote:
> > > > > On Tue, 26 Mar 2019 14:50:58 +1100
> > > > > David Gibson <address@hidden> wrote:
> > > > >
> > > > > > qemu_getrampagesize() works out the minimum host page size backing
> > > > > > any of
> > > > > > guest RAM. This is required in a few places, such as for POWER8
> > > > > > PAPR KVM
> > > > > > guests, because limitations of the hardware virtualization mean the
> > > > > > guest
> > > > > > can't use pagesizes larger than the host pages backing its memory.
> > > > > >
> > > > > > However, it currently checks against *every* memory backend,
> > > > > > whether or not
> > > > > > it is actually mapped into guest memory at the moment. This is
> > > > > > incorrect.
> > > > > >
> > > > > > This can cause a problem attempting to add memory to a POWER8
> > > > > > pseries KVM
> > > > > > guest which is configured to allow hugepages in the guest (e.g.
> > > > > > -machine cap-hpt-max-page-size=16m). If you attempt to add
> > > > > > non-hugepage,
> > > > > > you can (correctly) create a memory backend, however it (correctly)
> > > > > > will
> > > > > > throw an error when you attempt to map that memory into the guest by
> > > > > > 'device_add'ing a pc-dimm.
> > > > > >
> > > > > > What's not correct is that if you then reset the guest a startup
> > > > > > check
> > > > > > against qemu_getrampagesize() will cause a fatal error because of
> > > > > > the new
> > > > > > memory object, even though it's not mapped into the guest.
> > > > > I'd say that backend should be remove by mgmt app since device_add
> > > > > failed
> > > > > instead of leaving it to hang around. (but fatal error either not a
> > > > > nice
> > > > > behavior on QEMU part)
> > > >
> > > > I agree, but reset could be guest initiated, so even if management is
> > > > doing the right thing there's a window where it could break.
> > >
> > > We could (log term) get rid of qemu_getrampagesize() (it's not really good
> > > to push machine/target specific condition into generic function) and move
> > > pagesize calculation closer to machine. In this case machine (spapr) knows
> > > exactly when and what is mapped and can enumerate NOT backends but mapped
> > > memory regions and/or pc-dimms (lets say we have memory_region_page_size()
> > > and apply whatever policy to the results.
> >
> > So.. it used to be in the machine specific code, but was made generic
> > because it's used in the vfio code. Where it is ppc specific, but not
> > keyed into the machine specific code in a way that we can really
> > re-use it there.
> It probably was the easiest way to hack things together, but then API gets
> generalized and misused and then tweaked to specific machine usecase
> and it gets only worse over time.
I don't really know what you mean by that. In both the (main) ppc and
VFIO cases the semantics we want from getrampagesize() really are the
same: what's the smallest chunk of guest-contiguous memory we can rely
on to also be host-contiguous.
> > > > > > This patch corrects the problem by adjusting
> > > > > > find_max_supported_pagesize()
> > > > > > (called from qemu_getrampagesize() via object_child_foreach) to
> > > > > > exclude
> > > > > > non-mapped memory backends.
> > > > > I'm not sure about if it's ok do so. It depends on where from
> > > > > qemu_getrampagesize() is called. For example s390 calls it rather
> > > > > early
> > > > > when initializing KVM, so there isn't anything mapped yet.
> > > >
> > > > Oh. Drat.
> > > >
> > > > > And once I replace -mem-path with hostmem backend and drop
> > > > > qemu_mempath_getpagesize(mem_path) /which btw aren't guarantied to be
> > > > > mapped either/
> > > > > this patch might lead to incorrect results for initial memory as
> > > > > well.
> > > >
> > > > Uh.. I don't immediately see why.
> > > It depends on when qemu_getrampagesize() is called with this patch.
> > >
> > > Maybe qemu_getrampagesize() is not a good interface anymore?
> > > I also see it being called directly from device side hw/vfio/spapr.c,
> > > which
> > > in my opinion should be propagated from machine,
> >
> > Um.. I'm not sure what you mean by that.
> It would be better if machine would set a page size property on vfio device
> when it's created.
No, that doesn't work. The limitation doesn't originate with the
machine, it originates with the *host*. I mean, in practice it'll
nearly always be KVM and so a machine matching the host, but it
doesn't actually have to be that way.
It is possible to run an x86 (or ARM) guest with TCG using a VFIO
device on a POWER host, and that would need to use the POWER VFIO
driver. You'd need to line up a bunch of details to make it work, and
it'd be kind of pointless, but it's possible.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
- Re: [qemu-s390x] [Qemu-devel] [PATCH for-4.0?] exec: Only count mapped memory backends for qemu_getrampagesize(), (continued)
- Re: [qemu-s390x] [Qemu-devel] [PATCH for-4.0?] exec: Only count mapped memory backends for qemu_getrampagesize(), David Hildenbrand, 2019/03/29
- Re: [qemu-s390x] [Qemu-devel] [PATCH for-4.0?] exec: Only count mapped memory backends for qemu_getrampagesize(), Igor Mammedov, 2019/03/27
- Re: [qemu-s390x] [Qemu-devel] [PATCH for-4.0?] exec: Only count mapped memory backends for qemu_getrampagesize(), David Hildenbrand, 2019/03/27
- Re: [qemu-s390x] [Qemu-devel] [PATCH for-4.0?] exec: Only count mapped memory backends for qemu_getrampagesize(), David Hildenbrand, 2019/03/27
- Re: [qemu-s390x] [Qemu-devel] [PATCH for-4.0?] exec: Only count mapped memory backends for qemu_getrampagesize(), Igor Mammedov, 2019/03/27
- Re: [qemu-s390x] [Qemu-devel] [PATCH for-4.0?] exec: Only count mapped memory backends for qemu_getrampagesize(), David Hildenbrand, 2019/03/27
Re: [qemu-s390x] [PATCH for-4.0?] exec: Only count mapped memory backends for qemu_getrampagesize(), David Gibson, 2019/03/27