qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)


From: Keqian Zhu
Subject: Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
Date: Mon, 22 Mar 2021 22:02:38 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1

Hi Peter,

On 2021/3/11 4:32, Peter Xu wrote:
> This is v5 of the qemu dirty ring interface support.
> 
> 
> 
> v5:
> 
> - rebase
> 
> - dropped patch "update-linux-headers: Include const.h" after rebase
> 
> - dropped patch "KVM: Fixup kvm_log_clear_one_slot() ioctl return check" since
> 
>   similar patch got merged recently (38e0b7904eca7cd32f8953c3)
> 
> 
> 
> ========= v4 cover letter below =============
> 
> 
> 
> It is merely the same as v3 content-wise, but there're a few things to mention
> 
> besides the rebase itself:
> 
> 
> 
>   - I picked up two patches from Eric Farman for the linux-header updates 
> (from
> 
>     Eric's v3 series) for convenience just in case any of the series would got
> 
>     queued by any maintainer.
> 
> 
> 
>   - One more patch is added as "KVM: Disable manual dirty log when dirty ring
> 
>     enabled".  I found this when testing the branch after rebasing to latest
> 
>     qemu, that not only the manual dirty log capability is not needed for kvm
> 
>     dirty ring, but more importantly INITIALLY_ALL_SET is totally against kvm
> 
>     dirty ring and it could silently crash the guest after migration.  For 
> this
> 
>     new commit, I touched up "KVM: Add dirty-gfn-count property" a bit.
> 
> 
> 
>   - A few more documentation lines in qemu-options.hx.
> 
> 
> 
>   - I removed the RFC tag after kernel series got merged.
> 
> 
> 
> Again, this is only the 1st step to support dirty ring.  Ideally dirty ring
> 
> should grant QEMU the possibility to remove the whole layered dirty bitmap so
> 
> that dirty ring will work similarly as auto-converge enabled but should 
> better;
> 
> we will just throttle vcpus with the dirty ring kvm exit rather than 
> explicitly
> 
> adding a timer to stop the vcpu thread from entering the guest again (like 
> what
> 
> we did with current migration auto-converge).  Some more information could 
> also
> 
> be found in the kvm forum 2020 talk regarding kvm dirty ring (slides 21/22 
> [1]).
I have read this pdf and code, and I have some questions, hope you can help me. 
:)

You emphasize that dirty ring is a "Thread-local buffers", but dirty bitmap is 
global,
but I don't see it has optimization about "locking" compared to dirty bitmap.

The thread-local means that vCPU can flush hardware buffer into dirty ring 
without
locking, but for bitmap, vCPU can also use atomic set to mark dirty without 
locking.
Maybe I miss something?

The second question is that you observed longer migration time (55s->73s) when 
guest
has 24G ram and dirty rate is 800M/s. I am not clear about the reason. As with 
dirty
ring enabled, Qemu can get dirty info faster which means it handles dirty page 
more
quick, and guest can be throttled which means dirty page is generated slower. 
What's
the rationale for the longer migration time?

PS: As the dirty ring is still converted into dirty_bitmap of kvm_slot, so the
"get dirty info faster" maybe not true. :-(

Thanks,
Keqian

> 
> 
> 
> That next step (to remove all the dirty bitmaps, as mentioned above) is still
> 
> discussable: firstly I don't know whether there's anything I've overlooked in
> 
> there.  Meanwhile that's also only services huge VM cases, may not be 
> extremely
> 
> helpful with a lot major scenarios where VMs are not that huge.
> 
> 
> 
> There's probably other ways to fix huge VM migration issues, majorly focusing
> 
> on responsiveness and convergence.  For example, Google has proposed some new
> 
> userfaultfd kernel capability called "minor modes" [2] to track page minor
> 
> faults and that could be finally served for that purpose too using postcopy.
> 
> That's another long story so I'll stop here, but just as a marker along with
> 
> the dirty ring series so there'll still be a record to reference.
> 
> 
> 
> Said that, I still think this series is very worth merging even if we don't
> 
> persue the next steps yet, since dirty ring is disabled by default, and we can
> 
> always work upon this series.
> 
> 
> 
> Please review, thanks.
> 
> 
> 
> V3: 
> https://lore.kernel.org/qemu-devel/20200523232035.1029349-1-peterx@redhat.com/
> 
>     (V3 contains all the pre-v3 changelog)
> 
> 
> 
> QEMU branch for testing (requires kernel version 5.11-rc1+):
> 
>     https://github.com/xzpeter/qemu/tree/kvm-dirty-ring
> 
> 
> 
> [1] 
> https://static.sched.com/hosted_files/kvmforum2020/97/kvm_dirty_ring_peter.pdf
> 
> [2] 
> https://lore.kernel.org/lkml/20210107190453.3051110-1-axelrasmussen@google.com/
> 
> 
> 
> ---------------------------8<---------------------------------
> 
> 
> 
> Overview
> 
> ========
> 
> 
> 
> KVM dirty ring is a new interface to pass over dirty bits from kernel
> 
> to the userspace.  Instead of using a bitmap for each memory region,
> 
> the dirty ring contains an array of dirtied GPAs to fetch, one ring
> 
> per vcpu.
> 
> 
> 
> There're a few major changes comparing to how the old dirty logging
> 
> interface would work:
> 
> 
> 
> - Granularity of dirty bits
> 
> 
> 
>   KVM dirty ring interface does not offer memory region level
> 
>   granularity to collect dirty bits (i.e., per KVM memory
> 
>   slot). Instead the dirty bit is collected globally for all the vcpus
> 
>   at once.  The major effect is on VGA part because VGA dirty tracking
> 
>   is enabled as long as the device is created, also it was in memory
> 
>   region granularity.  Now that operation will be amplified to a VM
> 
>   sync.  Maybe there's smarter way to do the same thing in VGA with
> 
>   the new interface, but so far I don't see it affects much at least
> 
>   on regular VMs.
> 
> 
> 
> - Collection of dirty bits
> 
> 
> 
>   The old dirty logging interface collects KVM dirty bits when
> 
>   synchronizing dirty bits.  KVM dirty ring interface instead used a
> 
>   standalone thread to do that.  So when the other thread (e.g., the
> 
>   migration thread) wants to synchronize the dirty bits, it simply
> 
>   kick the thread and wait until it flushes all the dirty bits to the
> 
>   ramblock dirty bitmap.
> 
> 
> 
> A new parameter "dirty-ring-size" is added to "-accel kvm".  By
> 
> default, dirty ring is still disabled (size==0).  To enable it, we
> 
> need to be with:
> 
> 
> 
>   -accel kvm,dirty-ring-size=65536
> 
> 
> 
> This establishes a 64K dirty ring buffer per vcpu.  Then if we
> 
> migrate, it'll switch to dirty ring.
> 
> 
> 
> I gave it a shot with a 24G guest, 8 vcpus, using 10g NIC as migration
> 
> channel.  When idle or dirty workload small, I don't observe major
> 
> difference on total migration time.  When with higher random dirty
> 
> workload (800MB/s dirty rate upon 20G memory, worse for kvm dirty
> 
> ring). Total migration time is (ping pong migrate for 6 times, in
> 
> seconds):
> 
> 
> 
> |-------------------------+---------------|
> 
> | dirty ring (4k entries) | dirty logging |
> 
> |-------------------------+---------------|
> 
> |                      70 |            58 |
> 
> |                      78 |            70 |
> 
> |                      72 |            48 |
> 
> |                      74 |            52 |
> 
> |                      83 |            49 |
> 
> |                      65 |            54 |
> 
> |-------------------------+---------------|
> 
> 
> 
> Summary:
> 
> 
> 
> dirty ring average:    73s
> 
> dirty logging average: 55s
> 
> 
> 
> The KVM dirty ring will be slower in above case.  The number may show
> 
> that the dirty logging is still preferred as a default value because
> 
> small/medium VMs are still major cases, and high dirty workload
> 
> happens frequently too.  And that's what this series did.
> 
> 
> 
> TODO:
> 
> 
> 
> - Consider to drop the BQL dependency: then we can run the reaper thread in
> 
>   parallel of main thread.  Needs some thought around the race conditions.
> 
> 
> 
> - Consider to drop the kvmslot bitmap: logically this can be dropped with kvm
> 
>   dirty ring, not only for space saving, but also it's still another layer
> 
>   linear to guest mem size which is against the whole idea of kvm dirty ring.
> 
>   This should make above number (of kvm dirty ring) even smaller (but still 
> may
> 
>   not be as good as dirty logging when with such high workload).
> 
> 
> 
> Please refer to the code and comment itself for more information.
> 
> 
> 
> Thanks,
> 
> 
> 
> Peter Xu (10):
> 
>   memory: Introduce log_sync_global() to memory listener
> 
>   KVM: Use a big lock to replace per-kml slots_lock
> 
>   KVM: Create the KVMSlot dirty bitmap on flag changes
> 
>   KVM: Provide helper to get kvm dirty log
> 
>   KVM: Provide helper to sync dirty bitmap from slot to ramblock
> 
>   KVM: Simplify dirty log sync in kvm_set_phys_mem
> 
>   KVM: Cache kvm slot dirty bitmap size
> 
>   KVM: Add dirty-gfn-count property
> 
>   KVM: Disable manual dirty log when dirty ring enabled
> 
>   KVM: Dirty ring support
> 
> 
> 
>  accel/kvm/kvm-all.c      | 585 +++++++++++++++++++++++++++++++++------
> 
>  accel/kvm/trace-events   |   7 +
> 
>  include/exec/memory.h    |  12 +
> 
>  include/hw/core/cpu.h    |   8 +
> 
>  include/sysemu/kvm_int.h |   7 +-
> 
>  qemu-options.hx          |  12 +
> 
>  softmmu/memory.c         |  33 ++-
> 
>  7 files changed, 565 insertions(+), 99 deletions(-)
> 
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]