qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 09/15] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORE


From: David Hildenbrand
Subject: Re: [PATCH v7 09/15] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux
Date: Tue, 4 May 2021 12:21:25 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

On 04.05.21 12:09, Daniel P. Berrangé wrote:
On Wed, Apr 28, 2021 at 03:37:48PM +0200, David Hildenbrand wrote:
Let's support RAM_NORESERVE via MAP_NORESERVE on Linux. The flag has no
effect on most shared mappings - except for hugetlbfs and anonymous memory.

Linux man page:
   "MAP_NORESERVE: Do not reserve swap space for this mapping. When swap
   space is reserved, one has the guarantee that it is possible to modify
   the mapping. When swap space is not reserved one might get SIGSEGV
   upon a write if no physical memory is available. See also the discussion
   of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before
   2.6, this flag had effect only for private writable mappings."

Note that the "guarantee" part is wrong with memory overcommit in Linux.

Also, in Linux hugetlbfs is treated differently - we configure reservation
of huge pages from the pool, not reservation of swap space (huge pages
cannot be swapped).

The rough behavior is [1]:
a) !Hugetlbfs:

   1) Without MAP_NORESERVE *or* with memory overcommit under Linux
      disabled ("/proc/sys/vm/overcommit_memory == 2"), the following
      accounting/reservation happens:
       For a file backed map
        SHARED or READ-only - 0 cost (the file is the map not swap)
        PRIVATE WRITABLE - size of mapping per instance

       For an anonymous or /dev/zero map
        SHARED   - size of mapping
        PRIVATE READ-only - 0 cost (but of little use)
        PRIVATE WRITABLE - size of mapping per instance

   2) With MAP_NORESERVE, no accounting/reservation happens.

b) Hugetlbfs:

   1) Without MAP_NORESERVE, huge pages are reserved.

   2) With MAP_NORESERVE, no huge pages are reserved.

Note: With "/proc/sys/vm/overcommit_memory == 0", we were already able
to configure it for !hugetlbfs globally; this toggle now allows
configuring it more fine-grained, not for the whole system.

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM.

Can you explain this use case in more real world terms, as I'm not
understanding what a mgmt app would actually do with this in
practice ?

Let's consider huge pages for simplicity. Assume you have 128 free huge pages in your hypervisor that you want to dynamically assign to VMs.

Further assume you have two VMs running. A workflow could look like

1. Assign all huge pages to VM 0
2. Reassign 64 huge pages to VM 1
3. Reassign another 32 huge pages to VM 1
4. Reasssign 16 huge pages to VM 0
5. ...

Basically what we're used to doing with "ordinary" memory.

For that to work with virtio-mem, you'll have to disable reservation of huge pages for the virtio-mem managed memory region.

(prealloction of huge pages in virtio-mem to protect from user mistakes is a separate work item)

reserve=off will be the default for virtio-mem, and actual reservation/preallcoation will be done within virtio-mem. There could be use for "reserve=off" for virtio-balloon use cases as well, but I'd like to exclude that from the discussion for now.

Hope that answers your question, thanks.

--
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]