[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page ch
From: |
Hao Xiang |
Subject: |
Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads. |
Date: |
Thu, 22 Feb 2024 22:02:21 -0800 |
On Thu, Feb 22, 2024 at 6:33 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Feb 21, 2024 at 06:06:19PM -0300, Fabiano Rosas wrote:
> > Hao Xiang <hao.xiang@bytedance.com> writes:
> >
> > > This change adds a dedicated handler for
> > > MigrationOps::ram_save_target_page in
> >
> > nit: Add a dedicated handler...
> >
> > Usually "this patch/change" is used only when necessary to avoid
> > ambiguity.
> >
> > > multifd live migration. Now zero page checking can be done in the multifd
> > > threads
> > > and this becomes the default configuration. We still provide backward
> > > compatibility
> > > where zero page checking is done from the migration main thread.
> > >
> > > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > > ---
> > > migration/multifd.c | 1 +
> > > migration/options.c | 2 +-
> > > migration/ram.c | 53 ++++++++++++++++++++++++++++++++++-----------
> > > 3 files changed, 42 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > index fbb40ea10b..ef5dad1019 100644
> > > --- a/migration/multifd.c
> > > +++ b/migration/multifd.c
> > > @@ -13,6 +13,7 @@
> > > #include "qemu/osdep.h"
> > > #include "qemu/cutils.h"
> >
> > This include...
> >
> > > #include "qemu/rcu.h"
> > > +#include "qemu/cutils.h"
> >
> > is there already.
> >
> > > #include "exec/target_page.h"
> > > #include "sysemu/sysemu.h"
> > > #include "exec/ramblock.h"
> > > diff --git a/migration/options.c b/migration/options.c
> > > index 3c603391b0..3c79b6ccd4 100644
> > > --- a/migration/options.c
> > > +++ b/migration/options.c
> > > @@ -181,7 +181,7 @@ Property migration_properties[] = {
> > > MIG_MODE_NORMAL),
> > > DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection",
> > > MigrationState,
> > > parameters.zero_page_detection,
> > > - ZERO_PAGE_DETECTION_LEGACY),
> > > + ZERO_PAGE_DETECTION_MULTIFD),
> >
> > I think we'll need something to avoid a 9.0 -> 8.2 migration with this
> > enabled. Otherwise it will go along happily until we get data corruption
> > because the new QEMU didn't send any zero pages on the migration thread
> > and the old QEMU did not look for them in the multifd packet.
>
> It could be even worse, as the new QEMU will only attach "normal" pages
> after the multifd packet, the old QEMU could read more than it could,
> expecting all pages..
>
> >
> > Perhaps bumping the MULTIFD_VERSION when ZERO_PAGE_DETECTION_MULTIFD is
> > in use. We'd just need to fix the test in the new QEMU to check
> > (msg.version > MULTIFD_VERSION) instead of (msg.version != MULTIFD_VERSION).
>
> IMHO we don't need yet to change MULTIFD_VERSION, what we need is perhaps a
> compat entry in hw_compat_8_2 setting "zero-page-detection" to "legacy".
> We should make sure when "legacy" is set, multifd ran the old protocol
> (zero_num will always be 0, and will be ignored by old QEMUs, IIUC).
>
> One more comment is, when repost please consider split this patch into two;
> The new ram_save_target_page_multifd() hook can be done in another patch,
> AFAIU.
Sorry, I kept missing this. I will keep telling myself, compatibility
is king. I will set the hw_compat_8_2 setting and make sure to test
migration 9.0 -> 8.2 fails with "multifd" option set.
Will split patches.
>
> >
> > >
> > > /* Migration capabilities */
> > > DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 5ece9f042e..b088c5a98c 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs,
> > > PageSearchStatus *pss,
> > > QEMUFile *file = pss->pss_channel;
> > > int len = 0;
> > >
> > > - if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> > > - return 0;
> > > - }
> >
> > How does 'none' work now?
> >
> > > -
> > > if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> > > return 0;
> > > }
> > > @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs,
> > > PageSearchStatus *pss)
> > >
> > > static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
> > > {
> > > + assert(migrate_multifd());
> > > + assert(!migrate_compress());
> > > + assert(!migration_in_postcopy());
> >
> > Drop these, please. Keep only the asserts that are likely to trigger
> > during development, such as the existing ones at multifd_send_pages.
> >
> > > +
> > > if (!multifd_queue_page(block, offset)) {
> > > return -1;
> > > }
> > > @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs,
> > > PageSearchStatus *pss,
> > > */
> > > static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus
> > > *pss)
> > > {
> > > - RAMBlock *block = pss->block;
> > > ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > > int res;
> > >
> > > @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState
> > > *rs, PageSearchStatus *pss)
> > > return 1;
> > > }
> > >
> > > + return ram_save_page(rs, pss);
> >
> > Look at where git put this! Are you using the default diff algorithm? If
> > not try using --patience to see if it improves the diff.
> >
> > > +}
> > > +
> > > +/**
> > > + * ram_save_target_page_multifd: save one target page
> > > + *
> > > + * Returns the number of pages written
> >
> > We could be more precise here:
> >
> > ram_save_target_page_multifd: send one target page to multifd workers
> >
> > Returns 1 if the page was queued, -1 otherwise.
> >
> > > + *
> > > + * @rs: current RAM state
> > > + * @pss: data about the page we want to send
> > > + */
> > > +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus
> > > *pss)
> > > +{
> > > + RAMBlock *block = pss->block;
> > > + ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > > +
> > > + /* Multifd is not compatible with old compression. */
> > > + assert(!migrate_compress());
> >
> > This should already be enforced at options.c.
> >
> > > +
> > > + /* Multifd is not compabible with postcopy. */
> > > + assert(!migration_in_postcopy());
> >
> > Same here.
> >
> > > +
> > > /*
> > > - * Do not use multifd in postcopy as one whole host page should be
> > > - * placed. Meanwhile postcopy requires atomic update of pages, so
> > > even
> > > - * if host page size == guest page size the dest guest during run may
> > > - * still see partially copied pages which is data corruption.
> > > + * Backward compatibility support. While using multifd live
> > > + * migration, we still need to handle zero page checking on the
> > > + * migration main thread.
> > > */
> > > - if (migrate_multifd() && !migration_in_postcopy()) {
> > > - return ram_save_multifd_page(block, offset);
> > > + if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> > > + if (save_zero_page(rs, pss, offset)) {
> > > + return 1;
> > > + }
> > > }
> > >
> > > - return ram_save_page(rs, pss);
> > > + return ram_save_multifd_page(block, offset);
> > > }
> > >
> > > /* Should be called before sending a host page */
> > > @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void
> > > *opaque)
> > > }
> > >
> > > migration_ops = g_malloc0(sizeof(MigrationOps));
> > > - migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > > +
> > > + if (migrate_multifd()) {
> > > + migration_ops->ram_save_target_page =
> > > ram_save_target_page_multifd;
> > > + } else {
> > > + migration_ops->ram_save_target_page =
> > > ram_save_target_page_legacy;
> > > + }
> > >
> > > bql_unlock();
> > > ret = multifd_send_sync_main();
> >
>
> --
> Peter Xu
>
[PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads., Hao Xiang, 2024/02/16
Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads., Hao Xiang, 2024/02/23
Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads., Fabiano Rosas, 2024/02/23
[PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking., Hao Xiang, 2024/02/16
[PATCH v2 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface., Hao Xiang, 2024/02/16
[PATCH v2 7/7] Update maintainer contact for migration multifd zero page checking acceleration., Hao Xiang, 2024/02/16