[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue
From: |
Sahil Siddiq |
Subject: |
Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue |
Date: |
Fri, 31 Jan 2025 10:34:25 +0530 |
User-agent: |
Mozilla Thunderbird |
Hi,
On 1/24/25 1:04 PM, Eugenio Perez Martin wrote:
On Fri, Jan 24, 2025 at 6:47 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
On 1/21/25 10:07 PM, Eugenio Perez Martin wrote:
On Sun, Jan 19, 2025 at 7:37 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
On 1/7/25 1:35 PM, Eugenio Perez Martin wrote:
[...]
Apologies for the delay in replying. It took me a while to figure
this out, but I have now understood why this doesn't work. L1 is
unable to receive messages from L0 because they get filtered out
by hw/net/virtio-net.c:receive_filter [1]. There's an issue with
the MAC addresses.
In L0, I have:
$ ip a show tap0
6: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UNKNOWN group default qlen 1000
link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff
inet 111.1.1.1/24 scope global tap0
valid_lft forever preferred_lft forever
inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
In L1:
# ip a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP
group default qlen 1000
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
altname enp0s2
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0
valid_lft 83455sec preferred_lft 83455sec
inet6 fec0::7bd2:265e:3b8e:5acc/64 scope site dynamic noprefixroute
valid_lft 86064sec preferred_lft 14064sec
inet6 fe80::50e7:5bf6:fff8:a7b0/64 scope link noprefixroute
valid_lft forever preferred_lft forever
I'll call this L1-eth0.
In L2:
# ip a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP gro0
link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
altname enp0s7
inet 111.1.1.2/24 scope global eth0
valid_lft forever preferred_lft forever
I'll call this L2-eth0.
Apart from eth0, lo is the only other device in both L1 and L2.
A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57)
as its destination address. When booting L2 with x-svq=false, the
value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts
the frames and passes them on to L2 and pinging works [2].
So this behavior is interesting by itself. But L1's kernel net system
should not receive anything. As I read it, even if it receives it, it
should not forward the frame to L2 as it is in a different subnet. Are
you able to read it using tcpdump on L1?
I ran "tcpdump -i eth0" in L1. It didn't capture any of the packets
that were directed at L2 even though L2 was able to receive them.
Similarly, it didn't capture any packets that were sent from L2 to
L0. This is when L2 is launched with x-svq=false.
That's right. The virtio dataplane goes directly from L0 to L2, you
should not be able to see any packets in the net of L1.
I am a little confused here. Since vhost=off is set in L0's QEMU
(which is used to boot L1), I am able to inspect the packets when
tracing/debugging receive_filter in hw/net/virtio-net.c. [1] Does
this mean the dataplane from L0 to L2 passes through L0's QEMU
(so L0 QEMU is aware of what's going on), but bypasses L1 completely
so L1's kernel does not know what packets are being sent/received.
With x-svq=true, forcibly setting the LSB of n->mac to 0x57 in
receive_filter allows L2 to receive packets from L0. I added
the following line just before line 1771 [1] to check this out.
n->mac[5] = 0x57;
That's very interesting. Let me answer all the gdb questions below and
we can debug it deeper :).
Thank you for the primer on using gdb with QEMU. I am able to debug
QEMU now.
Maybe we can make the scenario clearer by telling which virtio-net
device is which with virtio_net_pci,mac=XX:... ?
However, when booting L2 with x-svq=true, n->mac is set to L1-eth0
(LSB = 56) in virtio_net_handle_mac() [3].
Can you tell with gdb bt if this function is called from net or the
SVQ subsystem?
It looks like the function is being called from net.
(gdb) bt
#0 virtio_net_handle_mac (n=0x15622425e, cmd=85 'U', iov=0x555558865980,
iov_cnt=1476792840) at ../hw/net/virtio-net.c:1098
#1 0x0000555555e5920b in virtio_net_handle_ctrl_iov (vdev=0x555558fdacd0,
in_sg=0x5555580611f8, in_num=1, out_sg=0x555558061208,
out_num=1) at ../hw/net/virtio-net.c:1581
#2 0x0000555555e593a0 in virtio_net_handle_ctrl (vdev=0x555558fdacd0,
vq=0x555558fe7730) at ../hw/net/virtio-net.c:1610
#3 0x0000555555e9a7d8 in virtio_queue_notify_vq (vq=0x555558fe7730) at
../hw/virtio/virtio.c:2484
#4 0x0000555555e9dffb in virtio_queue_host_notifier_read (n=0x555558fe77a4) at
../hw/virtio/virtio.c:3869
#5 0x000055555620329f in aio_dispatch_handler (ctx=0x555557d9f840,
node=0x7fffdca7ba80) at ../util/aio-posix.c:373
#6 0x000055555620346f in aio_dispatch_handlers (ctx=0x555557d9f840) at
../util/aio-posix.c:415
#7 0x00005555562034cb in aio_dispatch (ctx=0x555557d9f840) at
../util/aio-posix.c:425
#8 0x00005555562242b5 in aio_ctx_dispatch (source=0x555557d9f840,
callback=0x0, user_data=0x0) at ../util/async.c:361
#9 0x00007ffff6d86559 in ?? () from /usr/lib/libglib-2.0.so.0
#10 0x00007ffff6d86858 in g_main_context_dispatch () from
/usr/lib/libglib-2.0.so.0
#11 0x0000555556225bf9 in glib_pollfds_poll () at ../util/main-loop.c:287
#12 0x0000555556225c87 in os_host_main_loop_wait (timeout=294672) at
../util/main-loop.c:310
#13 0x0000555556225db6 in main_loop_wait (nonblocking=0) at
../util/main-loop.c:589
#14 0x0000555555c0c1a3 in qemu_main_loop () at ../system/runstate.c:835
#15 0x000055555612bd8d in qemu_default_main (opaque=0x0) at ../system/main.c:48
#16 0x000055555612be3d in main (argc=23, argv=0x7fffffffe508) at
../system/main.c:76
virtio_queue_notify_vq at hw/virtio/virtio.c:2484 [2] calls
vq->handle_output(vdev, vq). I see "handle_output" is a function
pointer and in this case it seems to be pointing to
virtio_net_handle_ctrl.
[...]
With x-svq=true, I see that n->mac is set by virtio_net_handle_mac()
[3] when L1 receives VIRTIO_NET_CTRL_MAC_ADDR_SET. With x-svq=false,
virtio_net_handle_mac() doesn't seem to be getting called. I haven't
understood how the MAC address is set in VirtIONet when x-svq=false.
Understanding this might help see why n->mac has different values
when x-svq is false vs when it is true.
Ok this makes sense, as x-svq=true is the one that receives the set
mac message. You should see it in L0's QEMU though, both in x-svq=on
and x-svq=off scenarios. Can you check it?
L0's QEMU seems to be receiving the "set mac" message only when L1
is launched with x-svq=true. With x-svq=off, I don't see any call
to virtio_net_handle_mac with cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET
in L0.
Ok this is interesting. Let's disable control virtqueue to start with
something simpler:
device virtio-net-pci,netdev=net0,ctrl_vq=off,...
QEMU will start complaining about features that depend on ctrl_vq,
like ctrl_rx. Let's disable all of them and check this new scenario.
I am still investigating this part. I set ctrl_vq=off and ctrl_rx=off.
I didn't get any errors as such about features that depend on ctrl_vq.
However, I did notice that after booting L2 (x-svq=true as well as
x-svq=false), no eth0 device was created. There was only a "lo" interface
in L2. An eth0 interface is present only when L1 (L0 QEMU) is booted
with ctrl_vq=on and ctrl_rx=on.
Thanks,
Sahil
[1] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/virtio-net.c#L1738
[2] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/virtio.c#L2484