[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH] tests/qtest/migration: Add cpu hotplug test
From: |
Stefan Hajnoczi |
Subject: |
Re: [RFC PATCH] tests/qtest/migration: Add cpu hotplug test |
Date: |
Tue, 14 Jan 2025 14:28:46 -0500 |
On Tue, 14 Jan 2025 at 09:15, Fabiano Rosas <farosas@suse.de> wrote:
>
> Stefan Hajnoczi <stefanha@gmail.com> writes:
>
> > On Mon, 13 Jan 2025 at 16:09, Fabiano Rosas <farosas@suse.de> wrote:
> >>
> >> Bug #2594 is about a failure during migration after a cpu hotplug. Add
> >> a test that covers that scenario. Start the source with -smp 2 and
> >> destination with -smp 3, plug one extra cpu to match and migrate.
> >>
> >> The issue seems to be a mismatch in the number of virtqueues between
> >> the source and destination due to the hotplug not changing the
> >> num_queues:
> >>
> >> get_pci_config_device: Bad config data: i=0x9a read: 4 device: 5
> >> cmask: ff wmask: 0 w1cmask:0
> >>
> >> Usage:
> >> $ QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 \
> >> ./tests/qtest/migration-test -p /x86_64/migration/hotplug/cpu
> >>
> >> References: https://gitlab.com/qemu-project/qemu/-/issues/2594
> >> References: https://issues.redhat.com/browse/RHEL-68302
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >> As you can see there's no fix attached to this. I haven't reached that
> >> part yet, suggestions welcome =). Posting the test case if anyone
> >> wants to play with this.
> >>
> >> (if someone at RH is already working on this, that's fine. I'm just
> >> trying to get some upstream bugs to move)
> >
> > The management tool should set num_queues on the destination to ensure
> > migration compatibility.
> >
>
> I'm not sure that's feasible. The default num-queues seem like an
> implementation detail that the management application would not have a
> way to query. Unless it starts the source with a fixed number that
> already accounts for all hotplug/unplug operations during the VM
> lifetime, which would be wasteful in terms of resources allocated
> upfront.
>
> That would also make the destination run with a suboptimal (< #vcpus)
> number of queues, although that's already the case in the source after
> the hotplug. Do we have any definition on what should happen durgin
> hotplug? If one plugs 100 vcpus, should num-queues remain as 2?
QEMU defaults num_queues to the number of present CPUs. A management
tool that wants to ensure that all hotplugged CPUs will have their own
virtqueues must set num_queues to max_cpus instead. This wastes
resources upfront but in theory the guest can operate efficiently. I
haven't checked the Linux guest drivers to see if they actually handle
virtqueue allocation after hotplug. The Linux drivers vary in how they
allocate virtqueue interrupts, so be sure to check several device
types like virtio-net and virtio-blk as they may behave differently.
Or the management tool can explicitly set num_queues to the number of
present CPUs and preserve that across live migration and CPU hotplug.
In that case num_queues can be updated across guest cold boot in order
to (eventually) achieve the optimal multi-queue configuration.
Other approaches might be possible too. The management tool has a
choice of how to implement this and QEMU doesn't dictate a specific
approach.
Stefan