[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 09/17] migration/multifd: Device state transfer support -
From: |
Fabiano Rosas |
Subject: |
Re: [PATCH v2 09/17] migration/multifd: Device state transfer support - receive side |
Date: |
Tue, 03 Sep 2024 11:42:57 -0300 |
"Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes:
> On 30.08.2024 22:22, Fabiano Rosas wrote:
>> "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes:
>>
>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>
>>> Add a basic support for receiving device state via multifd channels -
>>> channels that are shared with RAM transfers.
>>>
>>> To differentiate between a device state and a RAM packet the packet
>>> header is read first.
>>>
>>> Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the
>>> packet header either device state (MultiFDPacketDeviceState_t) or RAM
>>> data (existing MultiFDPacket_t) is then read.
>>>
>>> The received device state data is provided to
>>> qemu_loadvm_load_state_buffer() function for processing in the
>>> device's load_state_buffer handler.
>>>
>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>>> ---
>>> migration/multifd.c | 127 +++++++++++++++++++++++++++++++++++++-------
>>> migration/multifd.h | 31 ++++++++++-
>>> 2 files changed, 138 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index b06a9fab500e..d5a8e5a9c9b5 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
> (..)
>>> g_free(p->zero);
>>> @@ -1126,8 +1159,13 @@ static void *multifd_recv_thread(void *opaque)
>>> rcu_register_thread();
>>>
>>> while (true) {
>>> + MultiFDPacketHdr_t hdr;
>>> uint32_t flags = 0;
>>> + bool is_device_state = false;
>>> bool has_data = false;
>>> + uint8_t *pkt_buf;
>>> + size_t pkt_len;
>>> +
>>> p->normal_num = 0;
>>>
>>> if (use_packets) {
>>> @@ -1135,8 +1173,28 @@ static void *multifd_recv_thread(void *opaque)
>>> break;
>>> }
>>>
>>> - ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>>> - p->packet_len, &local_err);
>>> + ret = qio_channel_read_all_eof(p->c, (void *)&hdr,
>>> + sizeof(hdr), &local_err);
>>> + if (ret == 0 || ret == -1) { /* 0: EOF -1: Error */
>>> + break;
>>> + }
>>> +
>>> + ret = multifd_recv_unfill_packet_header(p, &hdr, &local_err);
>>> + if (ret) {
>>> + break;
>>> + }
>>> +
>>> + is_device_state = p->flags & MULTIFD_FLAG_DEVICE_STATE;
>>> + if (is_device_state) {
>>> + pkt_buf = (uint8_t *)p->packet_dev_state + sizeof(hdr);
>>> + pkt_len = sizeof(*p->packet_dev_state) - sizeof(hdr);
>>> + } else {
>>> + pkt_buf = (uint8_t *)p->packet + sizeof(hdr);
>>> + pkt_len = p->packet_len - sizeof(hdr);
>>> + }
>>
>> Should we have made the packet an union as well? Would simplify these
>> sorts of operations. Not sure I want to start messing with that at this
>> point to be honest. But OTOH, look at this...
>
> RAM packet length is not constant (at least from the viewpoint of the
> migration code) so the union allocation would need some kind of a
> "multifd_ram_packet_size()" runtime size determination.
>
> Also, since RAM and device state packet body size is different then
> for the extra complexity introduced by that union we'll just get rid of
> that single pkt_buf assignment.
>
>>> +
>>> + ret = qio_channel_read_all_eof(p->c, (char *)pkt_buf, pkt_len,
>>> + &local_err);
>>> if (ret == 0 || ret == -1) { /* 0: EOF -1: Error */
>>> break;
>>> }
>>> @@ -1181,8 +1239,33 @@ static void *multifd_recv_thread(void *opaque)
>>> has_data = !!p->data->size;
>>> }
>>>
>>> - if (has_data) {
>>> - ret = multifd_recv_state->ops->recv(p, &local_err);
>>> + if (!is_device_state) {
>>> + if (has_data) {
>>> + ret = multifd_recv_state->ops->recv(p, &local_err);
>>> + if (ret != 0) {
>>> + break;
>>> + }
>>> + }
>>> + } else {
>>> + g_autofree char *idstr = NULL;
>>> + g_autofree char *dev_state_buf = NULL;
>>> +
>>> + assert(use_packets);
>>> +
>>> + if (p->next_packet_size > 0) {
>>> + dev_state_buf = g_malloc(p->next_packet_size);
>>> +
>>> + ret = qio_channel_read_all(p->c, dev_state_buf,
>>> p->next_packet_size, &local_err);
>>> + if (ret != 0) {
>>> + break;
>>> + }
>>> + }
>>
>> What's the use case for !next_packet_size and still call
>> load_state_buffer below? I can't see it.
>
> Currently, next_packet_size == 0 has not usage indeed - it is
> a leftover from an early version of the patch set (not public)
> that had device state packet (chunk) indexing done by
> the common migration code, rather than by the VFIO consumer.
>
> And then an empty packet could be used to mark the stream
> boundary - like the max chunk number to expect.
>
>> ...because I would suggest to set has_data up there with
>> p->next_packet_size:
>>
>> if (use_packets) {
>> ...
>> has_data = p->next_packet_size || p->zero_num;
>> } else {
>> ...
>> has_data = !!p->data_size;
>> }
>>
>> and this whole block would be:
>>
>> if (has_data) {
>> if (is_device_state) {
>> multifd_device_state_recv(p, &local_err);
>> } else {
>> ret = multifd_recv_state->ops->recv(p, &local_err);
>> }
>> }
>
> The above block makes sense to me with two caveats:
I have suggestions below, but this is no big deal, so feel free to go
with what you think works best.
> 1) If empty device state packets (next_packet_size == 0) were
> to be unsupported they need to be rejected cleanly rather
> than silently skipped,
Should this be rejected on the send side? That's the most likely source
of the problem if it happens. Don't need to send something we know will
cause an error when loading.
And for the case of stream corruption of some sort we could hoist the
check from load_buffer into here:
else if (is_device_state) {
error_setg(errp, "empty device state packet);
break;
}
> 2) has_data has to have its value computed depending on whether
> this is a RAM or a device state packet since looking at
> p->normal_num and p->zero_num makes no sense for a device state
> packet while I am not sure that looking at p->next_packet_size
> for a RAM packet won't introduce some subtle regression.
It should be ok to use next_packet_size for RAM, it must always be in
sync with normal_num.
>
>>> +
>>> + idstr = g_strndup(p->packet_dev_state->idstr,
>>> sizeof(p->packet_dev_state->idstr));
>>> + ret = qemu_loadvm_load_state_buffer(idstr,
>>> +
>>> p->packet_dev_state->instance_id,
>>> + dev_state_buf,
>>> p->next_packet_size,
>>> + &local_err);
>>> if (ret != 0) {
>>> break;
>>> }
>>> @@ -1190,6 +1273,11 @@ static void *multifd_recv_thread(void *opaque)
>>>
>>> if (use_packets) {
>>> if (flags & MULTIFD_FLAG_SYNC) {
>>> + if (is_device_state) {
>>> + error_setg(&local_err, "multifd: received SYNC device
>>> state packet");
>>> + break;
>>> + }
>>
>> assert(!is_device_state) enough?
>
> It's not bug in the receiver code but rather an issue with the
> remote QEMU sending us wrong data if we get a SYNC device state
> packet.
>
> So I think returning an error is more appropriate than triggering
> an assert() failure for that.
ok
>>> +
>>> qemu_sem_post(&multifd_recv_state->sem_sync);
>>> qemu_sem_wait(&p->sem_sync);
>>> }
>>> @@ -1258,6 +1346,7 @@ int multifd_recv_setup(Error **errp)
>>> p->packet_len = sizeof(MultiFDPacket_t)
>>> + sizeof(uint64_t) * page_count;
>>> p->packet = g_malloc0(p->packet_len);
>>> + p->packet_dev_state = g_malloc0(sizeof(*p->packet_dev_state));
>>> }
>>> p->name = g_strdup_printf("mig/dst/recv_%d", i);
>>> p->normal = g_new0(ram_addr_t, page_count);
>>> diff --git a/migration/multifd.h b/migration/multifd.h
>>> index a3e35196d179..a8f3e4838c01 100644
>>> --- a/migration/multifd.h
>>> +++ b/migration/multifd.h
>>> @@ -45,6 +45,12 @@ MultiFDRecvData *multifd_get_recv_data(void);
>>> #define MULTIFD_FLAG_QPL (4 << 1)
>>> #define MULTIFD_FLAG_UADK (8 << 1)
>>>
>>> +/*
>>> + * If set it means that this packet contains device state
>>> + * (MultiFDPacketDeviceState_t), not RAM data (MultiFDPacket_t).
>>> + */
>>> +#define MULTIFD_FLAG_DEVICE_STATE (1 << 4)
>>
>> Overlaps with UADK. I assume on purpose because device_state doesn't
>> support compression? Might be worth a comment.
>>
>
> Yes, the device state transfer bit stream does not support compression
> so it is not a problem since these "compression type" flags will never
> be set in such bit stream anyway.
>
> Will add a relevant comment here.
>
> Thanks,
> Maciej