qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [PATCH RFCv2 0/4] virtio-mem: paravirtualized memory


From: David Hildenbrand
Subject: Re: [qemu-s390x] [PATCH RFCv2 0/4] virtio-mem: paravirtualized memory
Date: Wed, 23 May 2018 20:27:48 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

On 23.05.2018 20:24, David Hildenbrand wrote:
> This is the Linux driver side of virtio-mem. Compared to the QEMU side,
> it is in a pretty complete and clean state.
> 
> virtio-mem is a paravirtualized mechanism of adding/removing memory to/from
> a VM. We can do this on a 4MB granularity right now. In Linux, all
> memory is added to the ZONE_NORMAL, so unplugging cannot be guaranteed -
> but will be more likely to succeed compared to unplugging 128MB+ chunks.
> We might implement some optimizations in that area in the future that will
> make memory unplug more reliable.
> 
> For now, this is an easy way to give a VM access to more memory and
> eventually to remove some memory again. I am testing it on x86 and
> s390x (under QEMU TCG so far only).
> 
> This is the follow up on [1], but the concept, user interface and
> virtio protocol has been heavily changed. I am only including the important
> parts in this cover letter (because otherwise nobody will read it).  Please
> feel free to ask in case there are any questions.
> 
> This series is based on [4] and shows how it is being used. It contains
> further information. Also have a look at the description of patch nr 4 in
> this series.
> 
> This work is the result of the initital idea of Andrea Arcangeli to host
> enforce guest access to memory inflated in virtio-balloon using
> userfaultfd, which turned out to be problematic to implement. That's how
> I came up with virtio-mem.
> 
> --------------------------------------------------------------------------
> 1. High level concept
> --------------------------------------------------------------------------
> 
> Each virtio-mem device owns a memory region in the physical address space.
> The guest is allowed to plug and online up to 'requested_size' of memory.
> It will not be allowed to plug more than that size. Unplugged memory will
> be protected by configurable mechanisms (e.g. random discard, userfaultfd
> protection, etc.). virtio-mem is designed in a way that a guest may never
> assume to be able to even read unplugged memory. This is a big difference
> to classical balloon drivers.
> 
> The usable memory region might grow over time, so not all parts of the
> device memory region might be usable from the start. This is an
> optimization to allow a smarter implementation in the hypervisor (reduce
> size of dirty bitmaps, size of memory regions ...).
> 
> When the device driver starts up, it will query 'requested_size' and start
> to add memory to the system. This memory is not indicated e.g. via ACPI,
> so unmodified systems will not silently try to use unplugged memory that
> they are not supposed to touch.
> 
> Updates on the 'requested_size' indicate hypervisor requests to plug or
> unplug memory.
> 
> As each virtio-mem device can belong to a NUMA node, we can easily
> plug/unplug memory on a NUMA basis. And of course, we can have several
> independent virtio-mem devices for a VM.
> 
> The idea is *not* to add new virtio-mem devices when hotplugging memory,
> the idea is to resize (grow/shrink) virtio-mem devices.
> 
> --------------------------------------------------------------------------
> 2. Benefits
> --------------------------------------------------------------------------
> 
> Guest side:
> - Increase memory usable by Linux in 4MB steps (vs. section size like 128MB
>   on x86 or 2GB on e.g. some arm if I'm not mistaking)
> - Remove struct pages once all 4MB chunks of a section are offline (in
>   contrast to all balloon drivers where this never happens)
> - Don't fragment memory, while still being able to unplug smaller chunks
>   than ordinary DIMM sizes.
> - Memory hotplug support for architectures that have no proper interface
>   (e.g. s390x misses the external notification part) or e.g. QEMU/Linux
>   support is complicated to implement.
> - Automatic management of onlining/offlining in the device driver -
>   no manual interaction from an admin/tool necessary.
> 
> QEMU side:
> - Resizing (plug/unplug) has a single interface - in contrast to a mixture
>   of ACPI and virtio-balloon. See the example below.
> - Migration works out of the box - no need to specify new DIMMs or new
>   sizes on the migration target. It simply works.
> - We can resize in arbitrary steps and sizes (in contrast to e.g. ACPI,
>   where we have to know upfront in which granularity we later on want to
>   remove memory or even how much memory we eventually want to add to our
>   guest)
> - One interface to rule them (architectures) all :)
> 
> --------------------------------------------------------------------------
> 3. Reboot handling
> --------------------------------------------------------------------------
> 
> After a reboot, all memory is unplugged. This allows the hypervisor
> to see if support for virtio-mem is available in the freshly booted system.
> This way we could charge only for the actually "plugged" memory size. And
> it avoids to sense for plugged memory in the guest.
> 
> E.g. on every size change of a virtio-mem device, we can notify management
> layers. So we can track how much memory a VM has plugged.
> 
> --------------------------------------------------------------------------
> 4. Example
> --------------------------------------------------------------------------
> 
> (not including resizable memory regions on the QEMU side yet, so don't
>  focus on that part - it will consume a lot of memory right now for e.g.
>  dirty bitmaps and memory slot tracking data)
> 
> Start QEMU with two virtio-mem devices that provide little memory inititally.
>       $ qemu-system-x86_64 -m 4G,maxmem=504G \
>               -smp sockets=2,cores=2 \
>               [...]
>               -object memory-backend-ram,id=mem0,size=256G \
>               -device virtio-mem-pci,id=vm0,memdev=mem0,node=0,size=4160M \
>               -object memory-backend-ram,id=mem1,size=256G \
>               -device virtio-mem-pci,id=vm1,memdev=mem1,node=1,size=3G
> 
> Query the configuration ('size' tells us the guest driver is active):
>       (qemu) info memory-devices
>       info memory-devices
>       Memory device [virtio-mem]: "vm0"
>         phys-addr: 0x140000000
>         node: 0
>         requested-size: 4362076160
>         size: 4362076160
>         max-size: 274877906944
>         block-size: 4194304
>         memdev: /objects/mem0
>       Memory device [virtio-mem]: "vm1"
>         phys-addr: 0x4140000000
>         node: 1
>         requested-size: 3221225472
>         size: 3221225472
>         max-size: 274877906944
>         block-size: 4194304
>         memdev: /objects/mem1
> 
> Change the size of a virtio-mem device:
>       (qemu) memory-device-resize vm0 40960
>       memory-device-resize vm0 40960
>       ...
>       (qemu) info memory-devices
>       info memory-devices
>       Memory device [virtio-mem]: "vm0"
>         phys-addr: 0x140000000
>         node: 0
>         requested-size: 42949672960
>         size: 42949672960
>         max-size: 274877906944
>         block-size: 4194304
>         memdev: /objects/mem0
>       ...
> 
> Try to unplug memory (KASAN active in the guest - a lot of memory wasted):
>       (qemu) memory-device-resize vm0 1024
>       memory-device-resize vm0 1024
>       ...
>       (qemu) info memory-devices
>       info memory-devices
>       Memory device [virtio-mem]: "vm0"
>         phys-addr: 0x140000000
>         node: 0
>         requested-size: 1073741824
>         size: 6169821184
>         max-size: 274877906944
>         block-size: 4194304
>         memdev: /objects/mem0
>       ...
> 
> I am sharing for now only the linux driver side. The current code can be
> found at [2]. The QEMU side is still heavily WIP, the current QEMU
> prototype can be found at [3].
> 
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html
> [2] https://github.com/davidhildenbrand/linux/tree/virtio-mem
> [3] https://github.com/davidhildenbrand/qemu/tree/virtio-mem
> [4] https://www.mail-archive.com/address@hidden/msg1698014.html
> 
> David Hildenbrand (4):
>   ACPI: NUMA: export pxm_to_node
>   s390: mm: support removal of memory
>   s390: numa: implement memory_add_physaddr_to_nid()
>   virtio-mem: paravirtualized memory
> 
>  arch/s390/mm/init.c             |   18 +-
>  arch/s390/numa/numa.c           |   12 +
>  drivers/acpi/numa.c             |    1 +
>  drivers/virtio/Kconfig          |   15 +
>  drivers/virtio/Makefile         |    1 +
>  drivers/virtio/virtio_mem.c     | 1040 +++++++++++++++++++++++++++++++
>  include/uapi/linux/virtio_ids.h |    1 +
>  include/uapi/linux/virtio_mem.h |  134 ++++
>  8 files changed, 1216 insertions(+), 6 deletions(-)
>  create mode 100644 drivers/virtio/virtio_mem.c
>  create mode 100644 include/uapi/linux/virtio_mem.h
> 

cc-ing some further mailing lists

-- 

Thanks,

David / dhildenb



reply via email to

[Prev in Thread] Current Thread [Next in Thread]