qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 49/51] io/channel-watch: Fix socket watch on Windows


From: Bin Meng
Subject: Re: [PATCH 49/51] io/channel-watch: Fix socket watch on Windows
Date: Mon, 5 Sep 2022 14:13:45 +0800

On Mon, Sep 5, 2022 at 2:04 PM Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
>
> Hi
>
> On Sun, Sep 4, 2022 at 10:24 AM Bin Meng <bmeng.cn@gmail.com> wrote:
>>
>> On Thu, Sep 1, 2022 at 8:58 PM Marc-André Lureau
>> <marcandre.lureau@gmail.com> wrote:
>> >
>> > Hi
>> >
>> > On Wed, Aug 24, 2022 at 2:49 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>> >>
>> >> From: Bin Meng <bin.meng@windriver.com>
>> >>
>> >> Random failure was observed when running qtests on Windows due to
>> >> "Broken pipe" detected by qmp_fd_receive(). What happened is that
>> >> the qtest executable sends testing data over a socket to the QEMU
>> >> under test but no response is received. The errno of the recv()
>> >> call from the qtest executable indicates ETIMEOUT, due to the qmp
>> >> chardev's tcp_chr_read() is never called to receive testing data
>> >> hence no response is sent to the other side.
>> >>
>> >> tcp_chr_read() is registered as the callback of the socket watch
>> >> GSource. The reason of the callback not being called by glib, is
>> >> that the source check fails to indicate the source is ready. There
>> >> are two socket watch sources created to monitor the same socket
>> >> event object from the char-socket backend in update_ioc_handlers().
>> >>
>> >> During the source check phase, qio_channel_socket_source_check()
>> >> calls WSAEnumNetworkEvents() to discovers occurrences of network
>> >> events for the indicated socket, clear internal network event records,
>> >> and reset the event object. Testing shows that if we don't reset the
>> >> event object by not passing the event handle to WSAEnumNetworkEvents()
>> >> the symptom goes away and qtest runs very stably.
>> >>
>> >> It looks we don't need to call WSAEnumNetworkEvents() at all, as we
>> >> don't parse the result of WSANETWORKEVENTS returned from this API.
>> >> We use select() to poll the socket status. Fix this instability by
>> >> dropping the WSAEnumNetworkEvents() call.
>> >>
>> >> Signed-off-by: Bin Meng <bin.meng@windriver.com>
>> >
>> >
>> > What clears the event then?
>> >
>>
>> It seems we don't need to clear the event as everything still works as 
>> expected.
>
>
> Well, it can "work" but are you sure it doesn't have a busy loop?

You mean busy loop in g_poll()?

>> >>
>> >> ---
>> >> During the testing, I removed the following codes in 
>> >> update_ioc_handlers():
>> >>
>> >>     remove_hup_source(s);
>> >>     s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP);
>> >>     g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup,
>> >>                           chr, NULL);
>> >>     g_source_attach(s->hup_source, chr->gcontext);
>> >>
>> >> and such change also makes the symptom go away.
>> >>
>> >> And if I moved the above codes to the beginning, before the call to
>> >> io_add_watch_poll(), the symptom also goes away.
>> >>
>> >> It seems two sources watching on the same socket event object is
>> >> the key that leads to the instability. The order of adding a source
>> >> watch seems to also play a role but I can't explain why.
>> >> Hopefully a Windows and glib expert could explain this behavior.
>> >>
>> >
>> > Feel free to leave that comment in the commit message.
>>
>> Sure, will add the above message into the commit in v2.
>>
>> >
>> > This is strange, as both sources should have different events, clearing 
>> > one shouldn't affect the other.
>>
>> Both sources have the same event, as one QIO channel only has one
>> socket, and one socket can only be bound to one event.
>
>
>  "one socket can only be bound to one event", is that a WSAEventSelect 
> limitation?
>

Yes, please see the MSDN:

It is not possible to specify different event objects for different
network events. The following code will not work; the second call will
cancel the effects of the first, and only the FD_WRITE network event
will be associated with hEventObject2:

rc = WSAEventSelect(s, hEventObject1, FD_READ);
rc = WSAEventSelect(s, hEventObject2, FD_WRITE); //bad

>>
>> >
>> > I guess it's WSAEnumNetworkEvents clearing of the internal network event 
>> > records that is problematic.
>> >
>> > Can you check if you replace the call with ResetEvent() everything works?
>>
>> No, ResetEvent() does not work, and is not recommended by MSDN [1]
>> too, which says:
>
>
> It probably works to some extent (I see glib is using it 
> https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/giowin32.c#L903), What 
> you mean is that it doesn't solve the issue, I guess.

Correct, it does not solve the issue.

>>
>>
>> The proper way to reset the state of an event object used with the
>> WSAEventSelect function is to pass the handle of the event object to
>> the WSAEnumNetworkEvents function in the hEventObject parameter. This
>> will reset the event object and adjust the status of active FD events
>> on the socket in an atomic fashion.
>>
>
> This is not what you want though if you have multiple event objects for the 
> same socket.
>
>>
>> [1] 
>> https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect

Regards,
Bin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]