qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

回复: [PATCH] migration/multifd: receive channel socket needs to be set to


From: Yuchen
Subject: 回复: [PATCH] migration/multifd: receive channel socket needs to be set to non-blocking
Date: Mon, 23 Sep 2024 01:33:13 +0000


> -----邮件原件-----
> 发件人: Peter Xu <peterx@redhat.com>
> 发送时间: 2024年9月20日 23:53
> 收件人: yuchen (CCSPL) <yu.chen@h3c.com>
> 抄送: farosas@suse.de; qemu-devel@nongnu.org
> 主题: Re: [PATCH] migration/multifd: receive channel socket needs to be set to
> non-blocking
> 
> On Fri, Sep 20, 2024 at 10:05:42AM +0000, Yuchen wrote:
> > When the migration network is disconnected, the source qemu can exit
> > normally with an error, but the destination qemu is always blocked in
> > recvmsg(), causes the destination qemu main thread to be blocked.
> >
> > The destination qemu block stack:
> > Thread 13 (Thread 0x7f0178bfa640 (LWP 1895906) "multifdrecv_6"):
> > #0  0x00007f041b5af56f in recvmsg ()
> > #1  0x000055573ebd0b42 in qio_channel_socket_readv
> > #2  0x000055573ebce83f in qio_channel_readv
> > #3  qio_channel_readv_all_eof
> > #4  0x000055573ebce909 in qio_channel_readv_all
> > #5  0x000055573eaa1b1f in multifd_recv_thread
> > #6  0x000055573ec2f0b9 in qemu_thread_start
> > #7  0x00007f041b52bf7a in start_thread
> > #8  0x00007f041b5ae600 in clone3
> >
> > Thread 1 (Thread 0x7f0410c62240 (LWP 1895156) "kvm"):
> > #0  0x00007f041b528ae2 in __futex_abstimed_wait_common ()
> > #1  0x00007f041b5338b8 in __new_sem_wait_slow64.constprop.0
> > #2  0x000055573ec2fd34 in qemu_sem_wait (sem=0x555742b5a4e0)
> > #3  0x000055573eaa2f09 in multifd_recv_sync_main ()
> > #4  0x000055573e7d590d in ram_load_precopy
> (f=f@entry=0x555742291c20)
> > #5  0x000055573e7d5cbf in ram_load (opaque=<optimized out>,
> > version_id=<optimized out>, f=0x555742291c20)
> > #6  ram_load_entry (f=0x555742291c20, opaque=<optimized out>,
> > version_id=<optimized out>)
> > #7  0x000055573ea932e7 in qemu_loadvm_section_part_end
> > (mis=0x555741136c00, f=0x555742291c20)
> > #8  qemu_loadvm_state_main (f=f@entry=0x555742291c20,
> > mis=mis@entry=0x555741136c00)
> > #9  0x000055573ea94418 in qemu_loadvm_state (f=0x555742291c20,
> > mode=mode@entry=VMS_MIGRATE)
> > #10 0x000055573ea88be1 in process_incoming_migration_co
> > (opaque=<optimized out>)
> > #11 0x000055573ec43d13 in coroutine_trampoline (i0=<optimized out>,
> > i1=<optimized out>)
> > #12 0x00007f041b4f5d90 in ?? () from target:/usr/lib64/libc.so.6
> > #13 0x00007ffc11890270 in ?? ()
> > #14 0x0000000000000000 in ?? ()
> >
> > Setting the receive channel to non-blocking can solve the problem.
> 
> Multifd threads are real threads and there's no coroutine, I'm slightly 
> confused
> why it needs to use nonblock.
> 
> Why recvmsg() didn't get kicked out when disconnect?  Is it a generic Linux
> kernel are you using?
> 
My steps to reproduce:
ifdown migration network, or disable migration network using iptables. 
The probability of recurrence of these two methods is very high.

My test environment uses is linux-5.10.136.

multifd thread block in kernel:
# cat /proc/3416190/stack 
[<0>] wait_woken+0x43/0x80
[<0>] sk_wait_data+0x123/0x140
[<0>] tcp_recvmsg+0x4f8/0xa50
[<0>] inet6_recvmsg+0x5e/0x120
[<0>] ____sys_recvmsg+0x87/0x180
[<0>] ___sys_recvmsg+0x82/0x110
[<0>] __sys_recvmsg+0x56/0xa0
[<0>] do_syscall_64+0x3d/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

> I wonder whether that's the expected behavior for sockets.  E.g., we do have
> multifd/cancel test (test_multifd_tcp_cancel) and I think that runs this path 
> too
> with it always in block mode as of now..
> 
My previous statement may not be accurate. The migration network socket is not 
disconnected. 
I use ifdown or iptables to simulate the network card failure. 
Because the TCP connection was not disconnected, so recvmsg() was blocked.
Ordinary precopy migration, the destination also uses non-blocking, I think 
it's to avoid non-blocking.
Qemu master lastest code:
/**                                                                
 * migration_incoming_setup: Setup incoming migration              
 * @f: file for main migration channel                             
 */                                                                
static void migration_incoming_setup(QEMUFile *f)                  
{                                                                  
    MigrationIncomingState *mis = migration_incoming_get_current();
                                                                   
    if (!mis->from_src_file) {                                     
        mis->from_src_file = f;                                    
    }                                                              
    qemu_file_set_blocking(f, false);                              
}                                                                  

> >
> > Signed-off-by: YuChen <Yu.Chen@h3c.com>
> > ---
> >  migration/multifd.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c index
> > 9b200f4ad9..7b2a768f05 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -1318,6 +1318,8 @@ void multifd_recv_new_channel(QIOChannel *ioc,
> Error **errp)
> >          id = qatomic_read(&multifd_recv_state->count);
> >      }
> >
> > +    qio_channel_set_blocking(ioc, false, NULL);
> > +
> >      p = &multifd_recv_state->params[id];
> >      if (p->c != NULL) {
> >          error_setg(&local_err, "multifd: received id '%d' already
> > setup'",
> > --
> > 2.30.2
> > ----------------------------------------------------------------------
> > ---------------------------------------------------------------
> > 本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中
> 列出
> > 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或
> 部分地泄露、复制、
> > 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件
> 通知发件人并删除本
> > 邮件!
> > This e-mail and its attachments contain confidential information from
> > New H3C, which is intended only for the person or entity whose address
> > is listed above. Any use of the information contained herein in any
> > way (including, but not limited to, total or partial disclosure,
> > reproduction, or dissemination) by persons other than the intended
> > recipient(s) is prohibited. If you receive this e-mail in error,
> > please notify the sender by phone or email immediately and delete it!
> 
> --
> Peter Xu


reply via email to

[Prev in Thread] Current Thread [Next in Thread]