qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [RFC for-2.13 0/7] spapr: Clean up pagesize handling


From: David Gibson
Subject: Re: [Qemu-ppc] [RFC for-2.13 0/7] spapr: Clean up pagesize handling
Date: Fri, 27 Apr 2018 12:14:22 +1000
User-agent: Mutt/1.9.2 (2017-12-15)

On Thu, Apr 26, 2018 at 10:45:40AM +0200, Andrea Bolognani wrote:
> On Thu, 2018-04-26 at 10:55 +1000, David Gibson wrote:
> > On Wed, Apr 25, 2018 at 06:09:26PM +0200, Andrea Bolognani wrote:
> > > The new parameter would make it possible to make sure you will
> > > actually be able to use the page size you're interested in inside
> > > the guest, by preventing it from starting at all if the host didn't
> > > provide big enough backing pages;
> > 
> > That's right
> > 
> > > it would also ensure the guest
> > > gets access to different page sizes when running using TCG as an
> > > accelerator instead of KVM.
> > 
> > Uh.. it would ensure the guest *doesn't* get access to different page
> > sizes in TCG vs. KVM.  Is that what you meant to say?
> 
> Oops, looks like I accidentally a word there. Of course you got it
> right and I meant exactly the opposite of what I actually wrote :/

:)

> > > For a KVM guest running on a POWER8 host, the matrix would look
> > > like
> > > 
> > >     b \ m | 64 KiB |  2 MiB | 16 MiB |  1 GiB | 16 GiB |
> > >   -------- -------- -------- -------- -------- --------
> > >    64 KiB | 64 KiB | 64 KiB |        |        |        |
> > >   -------- -------- -------- -------- -------- --------
> > >    16 MiB | 64 KiB | 64 KiB | 16 MiB | 16 MiB |        |
> > >   -------- -------- -------- -------- -------- --------
> > >    16 GiB | 64 KiB | 64 KiB | 16 MiB | 16 MiB | 16 GiB |
> > >   -------- -------- -------- -------- -------- --------
> > > 
> > > with backing page sizes from top to bottom, requested max page
> > > sizes from left to right, actual max page sizes in the cells and
> > > empty cells meaning the guest won't be able to start; on a POWER9
> > > machine, the matrix would look like
> > > 
> > >     b \ m | 64 KiB |  2 MiB | 16 MiB |  1 GiB | 16 GiB |
> > >   -------- -------- -------- -------- -------- --------
> > >    64 KiB | 64 KiB | 64 KiB |        |        |        |
> > >   -------- -------- -------- -------- -------- --------
> > >     2 MiB | 64 KiB | 64 KiB |        |        |        |
> > >   -------- -------- -------- -------- -------- --------
> > >     1 GiB | 64 KiB | 64 KiB | 16 MiB | 16 MiB |        |
> > >   -------- -------- -------- -------- -------- --------
> > > 
> > > instead, and finally on TCG the backing page size wouldn't matter
> > > and you would simply have
> > > 
> > >     b \ m | 64 KiB |  2 MiB | 16 MiB |  1 GiB | 16 GiB |
> > >   -------- -------- -------- -------- -------- --------
> > >           | 64 KiB | 64 KiB | 16 MiB | 16 MiB | 16 GiB |
> > >   -------- -------- -------- -------- -------- --------
> > > 
> > > Does everything up until here make sense?
> > 
> > Yes, that all looks right.
> 
> Cool.
> 
> Unfortunately, that pretty much seals the deal on libvirt *not* being
> able to infer the value from other guest settings :(
> 
> The only reasonable candidate would be the size of host pages used for
> backing guest memory; however

Right.

>   * TCG, RPT and KVM PR guests can't infer anything from it, as they
>     are not tied to it. Having different behaviors for TCG and KVM
>     would be easy, but differentiating between HPT KVM HV guest and
>     all other kinds is something we can't do at the moment, and that
>     in the past have actively resisted doing;

Yeah, I certainly wouldn't recommend that.  It's basically what we're
doing in qemu now, and I want to change, because it's a bad idea.

It still would be possible to key off the host side hugepage size, but
apply the limit to all VMs - in a sense crippling TCG guests to give
them matching behaviour to KVM guests.

>   * the user might want to limit things further, eg. preventing an
>     HPT KVM HV guest backed by 16 MiB pages or an HPT TCG guest from
>     using hugepages.

Right.. note that with the draft qemu patches a TCG guest will be
prevented from using hugepages *by default* (the default value of the
capability is 16).  You have to explicitly change it to allow
hugepages to be used in a TCG guest (but you don't have to supply
hugepage backing).

> With the second use case in mind: would it make sense, or even be
> possible, to make it so the capability works for RPT guests too?

Possible, maybe.. I think there's another property where RPT pagesizes
are advertised.  But I think it's a bad idea.  In order to have the
normal HPT case work consistently we need to set the default cap value
to 16 (64kiB page max).  If that applied to RPT guests as well, we'd
be unnecessarily crippling nearly all RPT guests.

> Thinking even further, what about other architectures? Is this
> something they might want to do too? The scenario I have in mind is
> guests backed by regular pages being prevented from using hugepages
> with the rationale that they wouldn't have the same performance
> characteristics as if they were backed by hugepages; on the opposite
> side of the spectrum, you might want to ensure the pages used to
> back guest memory are as big as the biggest page you plan to use in
> the guest, in order to guarantee the performance characteristics
> fully match expectations.

Hm, well, you'd have to ask other arch people if they see a use for
that.  It doesn't look very useful to me.  I don't think libvirt can
or should ensure identical performance characteristics for a guest
across all possible migrations.  But for HPT guests, it's not a matter
of performance characteristics: if it tries to use a large page size
and KVM doesn't have large enough backing pages, the guest will
quickly just freeze on a page fault that can never be satisfied.

> > > While trying to figure out this, one of the things I attempted to
> > > do was run a guest in POWER8 compatibility mode on a POWER9 host
> > > and use hugepages for backing, but that didn't seem to work at
> > > all, possibly hinting at the fact that not all of the above is
> > > actually accurate and I need you to correct me :)
> > > [...]
> > 
> > Ok, so note that the scheme I'm talking about here is *not* merged as
> > yet.  The above command line will run the guest with 2MiB backing.
> > 
> > With the existing code that should work, but the guest will only be
> > able to use 64kiB pages.
> 
> Understood: even without the ability to limit it further, the max
> guest page size is obviously still capped by the backing page size.
> 
> > If it didn't work at all.. there was a bug
> > fixed relatively recently that broke all hugepage backing, so you
> > could try updating to a more recent host kernel.
> 
> That was probably it then!
> 
> I'll see whether I can get a newer kernel running on the host, but
> my primary concern was not having gotten the command line (or the
> concepts above) completely wrong :)

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]