[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] net/tap.c: Possibly a way to stall tap input
From: |
Jan Kiszka |
Subject: |
Re: [Qemu-devel] net/tap.c: Possibly a way to stall tap input |
Date: |
Fri, 02 Aug 2013 21:41:24 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666 |
On 2013-08-02 14:45, Jan Kiszka wrote:
> On 2013-08-02 13:46, Stefan Hajnoczi wrote:
>> On Thu, Aug 01, 2013 at 07:15:54PM +0200, Jan Kiszka wrote:
>>> I was digging into the involved code and found something fishy:
>>>
>>> net/tap.c:
>>> static void tap_send(void *opaque)
>>> {
>>> ...
>>> size = qemu_send_packet_async(&s->nc, buf, size,
>>> tap_send_completed);
>>> if (size == 0) {
>>> tap_read_poll(s, false);
>>> }
>>>
>>> So, if tap_send is registered for the mainloop polling (ie. can_receive
>>> returned true before starting to poll) but qemu_send_packet_async
>>> returns 0 now as qemu_can_send_packet/can_receive happens to report
>>> false in the meantime, we will disable read polling. If also write
>>> polling is off, the fd will be completely removed from the iohandler
>>> list. But even if write polling remains on, I wonder what should bring
>>> read polling back?
>>
>> This behavior seems fine to me. Once the peer (pcnet) is able to
>> receive again it must flush the queue, this will re-enable
>> tap_read_poll().
>>
>> Can you explain a bit more why this would be a problem?
>
> The problem is that I don't see at all what will call tap_read_poll(s,
> 1), neither in theory nor in reality.
>
> As long as the real test case is out of reach, I tried to emulate the
> faulty behaviour by letting tap_can_send always return 1. Result:
> reception stalls during boot as even qemu_flush_queued_packets cannot
> get it running again once tap_read_poll(s, 0) was called.
OK, false alarm. The issue was most likely fixed by commit 199ee608
(net: fix qemu_flush_queued_packets() in presence of a hub) which is
present in 1.5.x but not 1.3.x. We initially tried to test on 1.5 but
had to role back to 1.3 due to other issues - and missed this fix.
My understanding of the networking maze was confused by the unfortunate
naming of the incoming net client queues ("send_queue") - will propose a
renaming.
This still requires a confirmation on the target, but I'm quite
optimistic now.
Jan
signature.asc
Description: OpenPGP digital signature