External email: Use caution opening links or attachments
On 5.09.2024 15:08, Avihai Horon wrote:
Hi Maciej,
On 27/08/2024 20:54, Maciej S. Szmigiero wrote:
External email: Use caution opening links or attachments
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
This way both the start and end points of migrating a particular VFIO
device are known.
Add also a vfio_save_iterate_empty_hit trace event so it is known when
there's no more data to send for that device.
Out of curiosity, what are these traces used for?
Just for benchmarking, collecting these data makes it easier to
reason where possible bottlenecks may be.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
---
hw/vfio/migration.c | 13 +++++++++++++
hw/vfio/trace-events | 3 +++
include/hw/vfio/vfio-common.h | 3 +++
3 files changed, 19 insertions(+)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 262d42a46e58..24679d8c5034 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -472,6 +472,9 @@ static int vfio_save_setup(QEMUFile *f, void
*opaque, Error **errp)
return -ENOMEM;
}
+ migration->save_iterate_run = false;
+ migration->save_iterate_empty_hit = false;
+
if (vfio_precopy_supported(vbasedev)) {
switch (migration->device_state) {
case VFIO_DEVICE_STATE_RUNNING:
@@ -605,9 +608,17 @@ static int vfio_save_iterate(QEMUFile *f, void
*opaque)
VFIOMigration *migration = vbasedev->migration;
ssize_t data_size;
+ if (!migration->save_iterate_run) {
+ trace_vfio_save_iterate_started(vbasedev->name);
+ migration->save_iterate_run = true;
Maybe rename save_iterate_run to save_iterate_started so it's aligned
with trace_vfio_save_iterate_started and
trace_vfio_save_complete_precopy_started?
Will do.
+ }
+
data_size = vfio_save_block(f, migration);
if (data_size < 0) {
return data_size;
+ } else if (data_size == 0 && !migration->save_iterate_empty_hit) {
+ trace_vfio_save_iterate_empty_hit(vbasedev->name);
+ migration->save_iterate_empty_hit = true;
During precopy we could hit empty multiple times. Any reason why only
the first time should be traced?
This trace point is supposed to indicate whether the device state
transfer during the time the VM was still running likely has
exhausted the amount of data that can be transferred during
that phase.
In other words, the stopped-time device state transfer likely
only had to transfer the data which the device does not support
transferring during the live VM phase (with just a small possible
residual accrued since that trace point was hit).
If that trace point was hit then delaying the switch over point
further likely wouldn't help the device transfer less data during
the downtime.