qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with migration/rdma


From: Zhijian Li (Fujitsu)
Subject: Re: Problem with migration/rdma
Date: Thu, 7 Mar 2024 02:41:37 +0000
User-agent: Mozilla Thunderbird

Yu,


On 07/03/2024 00:30, Philippe Mathieu-Daudé wrote:
> Cc'ing RDMA migration reviewers/maintainers:
> 
> $ ./scripts/get_maintainer.pl -f migration/rdma.c
> Li Zhijian <lizhijian@fujitsu.com> (reviewer:RDMA Migration)
> Peter Xu <peterx@redhat.com> (maintainer:Migration)
> Fabiano Rosas <farosas@suse.de> (maintainer:Migration)
> 
> On 5/3/24 22:32, Yu Zhang wrote:
>> Hello Het and all,
>>
>> while I was testing qemu-8.2, I saw a lot of our migration test cases failed.
>> After debugging the commits of the 8.2 branch, I saw the issue and mad a 
>> diff:
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 6a29e53daf..f10d56f556 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -3353,9 +3353,9 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>>           goto err_rdma_dest_wait;
>>       }
>>
>> -    isock->host = rdma->host;
>> +    isock->host = g_strdup_printf("%s", rdma->host);
>>       isock->port = g_strdup_printf("%d", rdma->port);


Thanks for your analysis.

It will be great if you send this as a patch.


isock is defined as a _autoptr VVV
3333 _autoptr(InetSocketAddress) isock = g_new0(InetSocketAddress, 1);

I'm surprised that it seems the auto free scheme will free the member of isock 
as well
see below valrind log. That will cause a double free.

==809138== Invalid free() / delete / delete[] / realloc()
==809138==    at 0x483A9F5: free (vg_replace_malloc.c:538)
==809138==    by 0x598F70C: g_free (in /usr/lib64/libglib-2.0.so.0.6600.8)
==809138==    by 0x79B6AD: qemu_rdma_cleanup (rdma.c:2432)
==809138==    by 0x79CEE6: qio_channel_rdma_close_rcu (rdma.c:3108)
==809138==    by 0xC2E339: call_rcu_thread (rcu.c:301)
==809138==    by 0xC2116A: qemu_thread_start (qemu-thread-posix.c:541)
==809138==    by 0x72683F8: ??? (in /usr/lib64/libpthread-2.32.so)
==809138==    by 0x73824C2: clone (in /usr/lib64/libc-2.32.so)
==809138==  Address 0x13daa070 is 0 bytes inside a block of size 14 free'd
==809138==    at 0x483A9F5: free (vg_replace_malloc.c:538)
==809138==    by 0x598F70C: g_free (in /usr/lib64/libglib-2.0.so.0.6600.8)
==809138==    by 0xC058CF: qapi_dealloc_type_str (qapi-dealloc-visitor.c:68)
==809138==    by 0xC09EF3: visit_type_str (qapi-visit-core.c:349)
==809138==    by 0xBDDECC: visit_type_InetSocketAddressBase_members 
(qapi-visit-sockets.c:29)
==809138==    by 0xBDE055: visit_type_InetSocketAddress_members 
(qapi-visit-sockets.c:67)
==809138==    by 0xBDE30D: visit_type_InetSocketAddress 
(qapi-visit-sockets.c:119)
==809138==    by 0xBDDB38: qapi_free_InetSocketAddress (qapi-types-sockets.c:51)
==809138==    by 0x792351: glib_autoptr_clear_InetSocketAddress 
(qapi-types-sockets.h:109)
==809138==    by 0x79236F: glib_autoptr_cleanup_InetSocketAddress 
(qapi-types-sockets.h:109)
==809138==    by 0x79D956: qemu_rdma_accept (rdma.c:3341)
==809138==    by 0x79F05A: rdma_accept_incoming_migration (rdma.c:4041)
==809138==  Block was alloc'd at
==809138==    at 0x4839809: malloc (vg_replace_malloc.c:307)
==809138==    by 0x5992BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
==809138==    by 0x59A7FE3: g_strdup (in /usr/lib64/libglib-2.0.so.0.6600.8)
==809138==    by 0x79C2A8: qemu_rdma_data_init (rdma.c:2731)
==809138==    by 0x79F183: rdma_start_incoming_migration (rdma.c:4081)
==809138==    by 0x76F200: qemu_start_incoming_migration (migration.c:581)
==809138==    by 0x77193A: qmp_migrate_incoming (migration.c:1735)
==809138==    by 0x74B3D3: qmp_x_exit_preconfig (vl.c:2718)
==809138==    by 0x74DB6F: qemu_init (vl.c:3753)
==809138==    by 0xA14F3F: main (main.c:47)


Thanks
Zhijian


>>
>> which was introduced by the commit below:
>>
>> commit 3fa9642ff7d51f7fc3ba68e6ccd13a939d5bd609 (HEAD)
>> Author: Het Gala <het.gala@nutanix.com>
>> Date:   Mon Oct 23 15:20:45 2023 -0300
>>
>>      migration: convert rdma backend to accept MigrateAddress
>>
>>      RDMA based transport backend for 'migrate'/'migrate-incoming' QAPIs
>>      accept new wire protocol of MigrateAddress struct.
>>
>>      It is achived by parsing 'uri' string and storing migration parameters
>>      required for RDMA connection into well defined InetSocketAddress struct.
>>      ...
>>
>> A debug line
>>       isock->host = rdma->host;
>>       isock->port = g_strdup_printf("%d", rdma->port);
>> +fprintf(stdout, "QEMU: %s, host %s, port %s\n", __func__,
>> isock->host, isock->port);
>>
>> produced this error:
>> QEMU: qemu_rdma_accept, host ::, port 8089
>> corrupted size vs. prev_size in fastbins
>>
>> on the target host, which may indicate a crash related to the memory
>> allocation or a memory
>> corruption of the data. With the patch, it doesn't happen any more,
>> and the migration is fine.
>> Could you be kind to test this and confirm the issue?
>>
>> Furthermore, I'm confused by the two struct:
>>
>> struct InetSocketAddressBase {
>>      char *host;
>>      char *port;
>> };
>>
>> struct InetSocketAddress {
>>      /* Members inherited from InetSocketAddressBase: */
>>      char *host;
>>      char *port;
>>
>> To my understanding, they are used to consolidate the separated data
>> to a well-defined
>> struct "MigrateAddress", while the struct whose member receive their
>> data has a different type:
>>
>> typedef struct RDMAContext {
>>      char *host;
>>      int port;
>>      ...
>> }
>>
>> Is there any reason to keep "port" like this (char* instead of int) or
>> can we improve it?
>> Thank you so much for any of your comments!
>>
>> Best regards,
>>
>> Yu Zhang @ IONOS Compute Platform
>> 05.03.2024
>>
> 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]