[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue
From: |
Eugenio Perez Martin |
Subject: |
Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue |
Date: |
Fri, 24 Jan 2025 08:34:54 +0100 |
On Fri, Jan 24, 2025 at 6:47 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
>
> Hi,
>
> On 1/21/25 10:07 PM, Eugenio Perez Martin wrote:
> > On Sun, Jan 19, 2025 at 7:37 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> On 1/7/25 1:35 PM, Eugenio Perez Martin wrote:
> >>> On Fri, Jan 3, 2025 at 2:06 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> On 12/20/24 12:28 PM, Eugenio Perez Martin wrote:
> >>>>> On Thu, Dec 19, 2024 at 8:37 PM Sahil Siddiq <icegambit91@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> On 12/17/24 1:20 PM, Eugenio Perez Martin wrote:
> >>>>>>> On Tue, Dec 17, 2024 at 6:45 AM Sahil Siddiq <icegambit91@gmail.com>
> >>>>>>> wrote:
> >>>>>>>> On 12/16/24 2:09 PM, Eugenio Perez Martin wrote:
> >>>>>>>>> On Sun, Dec 15, 2024 at 6:27 PM Sahil Siddiq
> >>>>>>>>> <icegambit91@gmail.com> wrote:
> >>>>>>>>>> On 12/10/24 2:57 PM, Eugenio Perez Martin wrote:
> >>>>>>>>>>> On Thu, Dec 5, 2024 at 9:34 PM Sahil Siddiq
> >>>>>>>>>>> <icegambit91@gmail.com> wrote:
> >>>>>>>>>>>> [...]
> >>>>>>>>>>>> I have been following the "Hands on vDPA: what do you do
> >>>>>>>>>>>> when you ain't got the hardware v2 (Part 2)" [1] blog to
> >>>>>>>>>>>> test my changes. To boot the L1 VM, I ran:
> >>>>>>>>>>>>
> >>>>>>>>>>>> sudo ./qemu/build/qemu-system-x86_64 \
> >>>>>>>>>>>> -enable-kvm \
> >>>>>>>>>>>> -drive
> >>>>>>>>>>>> file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio
> >>>>>>>>>>>> \
> >>>>>>>>>>>> -net nic,model=virtio \
> >>>>>>>>>>>> -net user,hostfwd=tcp::2222-:22 \
> >>>>>>>>>>>> -device intel-iommu,snoop-control=on \
> >>>>>>>>>>>> -device
> >>>>>>>>>>>> virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,packed=on,event_idx=off,bus=pcie.0,addr=0x4
> >>>>>>>>>>>> \
> >>>>>>>>>>>> -netdev tap,id=net0,script=no,downscript=no \
> >>>>>>>>>>>> -nographic \
> >>>>>>>>>>>> -m 8G \
> >>>>>>>>>>>> -smp 4 \
> >>>>>>>>>>>> -M q35 \
> >>>>>>>>>>>> -cpu host 2>&1 | tee vm.log
> >>>>>>>>>>>>
> >>>>>>>>>>>> Without "guest_uso4=off,guest_uso6=off,host_uso=off,
> >>>>>>>>>>>> guest_announce=off" in "-device virtio-net-pci", QEMU
> >>>>>>>>>>>> throws "vdpa svq does not work with features" [2] when
> >>>>>>>>>>>> trying to boot L2.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The enums added in commit #2 in this series is new and
> >>>>>>>>>>>> wasn't in the earlier versions of the series. Without
> >>>>>>>>>>>> this change, x-svq=true throws "SVQ invalid device feature
> >>>>>>>>>>>> flags" [3] and x-svq is consequently disabled.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The first issue is related to running traffic in L2
> >>>>>>>>>>>> with vhost-vdpa.
> >>>>>>>>>>>>
> >>>>>>>>>>>> In L0:
> >>>>>>>>>>>>
> >>>>>>>>>>>> $ ip addr add 111.1.1.1/24 dev tap0
> >>>>>>>>>>>> $ ip link set tap0 up
> >>>>>>>>>>>> $ ip addr show tap0
> >>>>>>>>>>>> 4: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> >>>>>>>>>>>> fq_codel state UNKNOWN group default qlen 1000
> >>>>>>>>>>>> link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff
> >>>>>>>>>>>> inet 111.1.1.1/24 scope global tap0
> >>>>>>>>>>>> valid_lft forever preferred_lft forever
> >>>>>>>>>>>> inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto
> >>>>>>>>>>>> kernel_ll
> >>>>>>>>>>>> valid_lft forever preferred_lft forever
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am able to run traffic in L2 when booting without
> >>>>>>>>>>>> x-svq.
> >>>>>>>>>>>>
> >>>>>>>>>>>> In L1:
> >>>>>>>>>>>>
> >>>>>>>>>>>> $ ./qemu/build/qemu-system-x86_64 \
> >>>>>>>>>>>> -nographic \
> >>>>>>>>>>>> -m 4G \
> >>>>>>>>>>>> -enable-kvm \
> >>>>>>>>>>>> -M q35 \
> >>>>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \
> >>>>>>>>>>>> -netdev
> >>>>>>>>>>>> type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0 \
> >>>>>>>>>>>> -device
> >>>>>>>>>>>> virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7
> >>>>>>>>>>>> \
> >>>>>>>>>>>> -smp 4 \
> >>>>>>>>>>>> -cpu host \
> >>>>>>>>>>>> 2>&1 | tee vm.log
> >>>>>>>>>>>>
> >>>>>>>>>>>> In L2:
> >>>>>>>>>>>>
> >>>>>>>>>>>> # ip addr add 111.1.1.2/24 dev eth0
> >>>>>>>>>>>> # ip addr show eth0
> >>>>>>>>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> >>>>>>>>>>>> fq_codel state UP group default qlen 1000
> >>>>>>>>>>>> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
> >>>>>>>>>>>> altname enp0s7
> >>>>>>>>>>>> inet 111.1.1.2/24 scope global eth0
> >>>>>>>>>>>> valid_lft forever preferred_lft forever
> >>>>>>>>>>>> inet6 fe80::9877:de30:5f17:35f9/64 scope link
> >>>>>>>>>>>> noprefixroute
> >>>>>>>>>>>> valid_lft forever preferred_lft forever
> >>>>>>>>>>>>
> >>>>>>>>>>>> # ip route
> >>>>>>>>>>>> 111.1.1.0/24 dev eth0 proto kernel scope link src 111.1.1.2
> >>>>>>>>>>>>
> >>>>>>>>>>>> # ping 111.1.1.1 -w3
> >>>>>>>>>>>> PING 111.1.1.1 (111.1.1.1) 56(84) bytes of data.
> >>>>>>>>>>>> 64 bytes from 111.1.1.1: icmp_seq=1 ttl=64 time=0.407 ms
> >>>>>>>>>>>> 64 bytes from 111.1.1.1: icmp_seq=2 ttl=64 time=0.671 ms
> >>>>>>>>>>>> 64 bytes from 111.1.1.1: icmp_seq=3 ttl=64 time=0.291 ms
> >>>>>>>>>>>>
> >>>>>>>>>>>> --- 111.1.1.1 ping statistics ---
> >>>>>>>>>>>> 3 packets transmitted, 3 received, 0% packet loss, time 2034ms
> >>>>>>>>>>>> rtt min/avg/max/mdev = 0.291/0.456/0.671/0.159 ms
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> But if I boot L2 with x-svq=true as shown below, I am unable
> >>>>>>>>>>>> to ping the host machine.
> >>>>>>>>>>>>
> >>>>>>>>>>>> $ ./qemu/build/qemu-system-x86_64 \
> >>>>>>>>>>>> -nographic \
> >>>>>>>>>>>> -m 4G \
> >>>>>>>>>>>> -enable-kvm \
> >>>>>>>>>>>> -M q35 \
> >>>>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \
> >>>>>>>>>>>> -netdev
> >>>>>>>>>>>> type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,x-svq=true,id=vhost-vdpa0
> >>>>>>>>>>>> \
> >>>>>>>>>>>> -device
> >>>>>>>>>>>> virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7
> >>>>>>>>>>>> \
> >>>>>>>>>>>> -smp 4 \
> >>>>>>>>>>>> -cpu host \
> >>>>>>>>>>>> 2>&1 | tee vm.log
> >>>>>>>>>>>>
> >>>>>>>>>>>> In L2:
> >>>>>>>>>>>>
> >>>>>>>>>>>> # ip addr add 111.1.1.2/24 dev eth0
> >>>>>>>>>>>> # ip addr show eth0
> >>>>>>>>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> >>>>>>>>>>>> fq_codel state UP group default qlen 1000
> >>>>>>>>>>>> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
> >>>>>>>>>>>> altname enp0s7
> >>>>>>>>>>>> inet 111.1.1.2/24 scope global eth0
> >>>>>>>>>>>> valid_lft forever preferred_lft forever
> >>>>>>>>>>>> inet6 fe80::9877:de30:5f17:35f9/64 scope link
> >>>>>>>>>>>> noprefixroute
> >>>>>>>>>>>> valid_lft forever preferred_lft forever
> >>>>>>>>>>>>
> >>>>>>>>>>>> # ip route
> >>>>>>>>>>>> 111.1.1.0/24 dev eth0 proto kernel scope link src 111.1.1.2
> >>>>>>>>>>>>
> >>>>>>>>>>>> # ping 111.1.1.1 -w10
> >>>>>>>>>>>> PING 111.1.1.1 (111.1.1.1) 56(84) bytes of data.
> >>>>>>>>>>>> From 111.1.1.2 icmp_seq=1 Destination Host Unreachable
> >>>>>>>>>>>> ping: sendmsg: No route to host
> >>>>>>>>>>>> From 111.1.1.2 icmp_seq=2 Destination Host Unreachable
> >>>>>>>>>>>> From 111.1.1.2 icmp_seq=3 Destination Host Unreachable
> >>>>>>>>>>>>
> >>>>>>>>>>>> --- 111.1.1.1 ping statistics ---
> >>>>>>>>>>>> 3 packets transmitted, 0 received, +3 errors, 100% packet loss,
> >>>>>>>>>>>> time 2076ms
> >>>>>>>>>>>> pipe 3
> >>>>>>>>>>>>
> >>>>>>>>>>>> The other issue is related to booting L2 with "x-svq=true"
> >>>>>>>>>>>> and "packed=on".
> >>>>>>>>>>>>
> >>>>>>>>>>>> In L1:
> >>>>>>>>>>>>
> >>>>>>>>>>>> $ ./qemu/build/qemu-system-x86_64 \
> >>>>>>>>>>>> -nographic \
> >>>>>>>>>>>> -m 4G \
> >>>>>>>>>>>> -enable-kvm \
> >>>>>>>>>>>> -M q35 \
> >>>>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \
> >>>>>>>>>>>> -netdev
> >>>>>>>>>>>> type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true
> >>>>>>>>>>>> \
> >>>>>>>>>>>> -device
> >>>>>>>>>>>> virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,packed=on,bus=pcie.0,addr=0x7
> >>>>>>>>>>>> \
> >>>>>>>>>>>> -smp 4 \
> >>>>>>>>>>>> -cpu host \
> >>>>>>>>>>>> 2>&1 | tee vm.log
> >>>>>>>>>>>>
> >>>>>>>>>>>> The kernel throws "virtio_net virtio1: output.0:id 0 is not
> >>>>>>>>>>>> a head!" [4].
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> So this series implements the descriptor forwarding from the
> >>>>>>>>>>> guest to
> >>>>>>>>>>> the device in packed vq. We also need to forward the descriptors
> >>>>>>>>>>> from
> >>>>>>>>>>> the device to the guest. The device writes them in the SVQ ring.
> >>>>>>>>>>>
> >>>>>>>>>>> The functions responsible for that in QEMU are
> >>>>>>>>>>> hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_flush, which is
> >>>>>>>>>>> called by
> >>>>>>>>>>> the device when used descriptors are written to the SVQ, which
> >>>>>>>>>>> calls
> >>>>>>>>>>> hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf. We need to
> >>>>>>>>>>> do
> >>>>>>>>>>> modifications similar to vhost_svq_add: Make them conditional if
> >>>>>>>>>>> we're
> >>>>>>>>>>> in split or packed vq, and "copy" the code from Linux's
> >>>>>>>>>>> drivers/virtio/virtio_ring.c:virtqueue_get_buf.
> >>>>>>>>>>>
> >>>>>>>>>>> After these modifications you should be able to ping and forward
> >>>>>>>>>>> traffic. As always, It is totally ok if it needs more than one
> >>>>>>>>>>> iteration, and feel free to ask any question you have :).
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I misunderstood this part. While working on extending
> >>>>>>>>>> hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf() [1]
> >>>>>>>>>> for packed vqs, I realized that this function and
> >>>>>>>>>> vhost_svq_flush() already support split vqs. However, I am
> >>>>>>>>>> unable to ping L0 when booting L2 with "x-svq=true" and
> >>>>>>>>>> "packed=off" or when the "packed" option is not specified
> >>>>>>>>>> in QEMU's command line.
> >>>>>>>>>>
> >>>>>>>>>> I tried debugging these functions for split vqs after running
> >>>>>>>>>> the following QEMU commands while following the blog [2].
> >>>>>>>>>>
> >>>>>>>>>> Booting L1:
> >>>>>>>>>>
> >>>>>>>>>> $ sudo ./qemu/build/qemu-system-x86_64 \
> >>>>>>>>>> -enable-kvm \
> >>>>>>>>>> -drive
> >>>>>>>>>> file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio
> >>>>>>>>>> \
> >>>>>>>>>> -net nic,model=virtio \
> >>>>>>>>>> -net user,hostfwd=tcp::2222-:22 \
> >>>>>>>>>> -device intel-iommu,snoop-control=on \
> >>>>>>>>>> -device
> >>>>>>>>>> virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,packed=off,event_idx=off,bus=pcie.0,addr=0x4
> >>>>>>>>>> \
> >>>>>>>>>> -netdev tap,id=net0,script=no,downscript=no \
> >>>>>>>>>> -nographic \
> >>>>>>>>>> -m 8G \
> >>>>>>>>>> -smp 4 \
> >>>>>>>>>> -M q35 \
> >>>>>>>>>> -cpu host 2>&1 | tee vm.log
> >>>>>>>>>>
> >>>>>>>>>> Booting L2:
> >>>>>>>>>>
> >>>>>>>>>> # ./qemu/build/qemu-system-x86_64 \
> >>>>>>>>>> -nographic \
> >>>>>>>>>> -m 4G \
> >>>>>>>>>> -enable-kvm \
> >>>>>>>>>> -M q35 \
> >>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \
> >>>>>>>>>> -netdev
> >>>>>>>>>> type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,x-svq=true,id=vhost-vdpa0
> >>>>>>>>>> \
> >>>>>>>>>> -device
> >>>>>>>>>> virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7
> >>>>>>>>>> \
> >>>>>>>>>> -smp 4 \
> >>>>>>>>>> -cpu host \
> >>>>>>>>>> 2>&1 | tee vm.log
> >>>>>>>>>>
> >>>>>>>>>> I printed out the contents of VirtQueueElement returned
> >>>>>>>>>> by vhost_svq_get_buf() in vhost_svq_flush() [3].
> >>>>>>>>>> I noticed that "len" which is set by "vhost_svq_get_buf"
> >>>>>>>>>> is always set to 0 while VirtQueueElement.len is non-zero.
> >>>>>>>>>> I haven't understood the difference between these two "len"s.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> VirtQueueElement.len is the length of the buffer, while the len of
> >>>>>>>>> vhost_svq_get_buf is the bytes written by the device. In the case of
> >>>>>>>>> the tx queue, VirtQueuelen is the length of the tx packet, and the
> >>>>>>>>> vhost_svq_get_buf is always 0 as the device does not write. In the
> >>>>>>>>> case of rx, VirtQueueElem.len is the available length for a rx
> >>>>>>>>> frame,
> >>>>>>>>> and the vhost_svq_get_buf len is the actual length written by the
> >>>>>>>>> device.
> >>>>>>>>>
> >>>>>>>>> To be 100% accurate a rx packet can span over multiple buffers, but
> >>>>>>>>> SVQ does not need special code to handle this.
> >>>>>>>>>
> >>>>>>>>> So vhost_svq_get_buf should return > 0 for rx queue (svq->vq->index
> >>>>>>>>> ==
> >>>>>>>>> 0), and 0 for tx queue (svq->vq->index % 2 == 1).
> >>>>>>>>>
> >>>>>>>>> Take into account that vhost_svq_get_buf only handles split vq at
> >>>>>>>>> the
> >>>>>>>>> moment! It should be renamed or splitted into
> >>>>>>>>> vhost_svq_get_buf_split.
> >>>>>>>>
> >>>>>>>> In L1, there are 2 virtio network devices.
> >>>>>>>>
> >>>>>>>> # lspci -nn | grep -i net
> >>>>>>>> 00:02.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network
> >>>>>>>> device [1af4:1000]
> >>>>>>>> 00:04.0 Ethernet controller [0200]: Red Hat, Inc. Virtio 1.0 network
> >>>>>>>> device [1af4:1041] (rev 01)
> >>>>>>>>
> >>>>>>>> I am using the second one (1af4:1041) for testing my changes and have
> >>>>>>>> bound this device to the vp_vdpa driver.
> >>>>>>>>
> >>>>>>>> # vdpa dev show -jp
> >>>>>>>> {
> >>>>>>>> "dev": {
> >>>>>>>> "vdpa0": {
> >>>>>>>> "type": "network",
> >>>>>>>> "mgmtdev": "pci/0000:00:04.0",
> >>>>>>>> "vendor_id": 6900,
> >>>>>>>> "max_vqs": 3,
> >>>>>>>
> >>>>>>> How is max_vqs=3? For this to happen L0 QEMU should have
> >>>>>>> virtio-net-pci,...,queues=3 cmdline argument.
> >>>>>
> >>>>> Ouch! I totally misread it :(. Everything is correct, max_vqs should
> >>>>> be 3. I read it as the virtio_net queues, which means queue *pairs*,
> >>>>> as it includes rx and tx queue.
> >>>>
> >>>> Understood :)
> >>>>
> >>>>>>
> >>>>>> I am not sure why max_vqs is 3. I haven't set the value of queues to 3
> >>>>>> in the cmdline argument. Is max_vqs expected to have a default value
> >>>>>> other than 3?
> >>>>>>
> >>>>>> In the blog [1] as well, max_vqs is 3 even though there's no queues=3
> >>>>>> argument.
> >>>>>>
> >>>>>>> It's clear the guest is not using them, we can add mq=off
> >>>>>>> to simplify the scenario.
> >>>>>>
> >>>>>> The value of max_vqs is still 3 after adding mq=off. The whole
> >>>>>> command that I run to boot L0 is:
> >>>>>>
> >>>>>> $ sudo ./qemu/build/qemu-system-x86_64 \
> >>>>>> -enable-kvm \
> >>>>>> -drive
> >>>>>> file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio
> >>>>>> \
> >>>>>> -net nic,model=virtio \
> >>>>>> -net user,hostfwd=tcp::2222-:22 \
> >>>>>> -device intel-iommu,snoop-control=on \
> >>>>>> -device
> >>>>>> virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,mq=off,ctrl_vq=on,ctrl_rx=on,packed=off,event_idx=off,bus=pcie.0,addr=0x4
> >>>>>> \
> >>>>>> -netdev tap,id=net0,script=no,downscript=no \
> >>>>>> -nographic \
> >>>>>> -m 8G \
> >>>>>> -smp 4 \
> >>>>>> -M q35 \
> >>>>>> -cpu host 2>&1 | tee vm.log
> >>>>>>
> >>>>>> Could it be that 2 of the 3 vqs are used for the dataplane and
> >>>>>> the third vq is the control vq?
> >>>>>>
> >>>>>>>> "max_vq_size": 256
> >>>>>>>> }
> >>>>>>>> }
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> The max number of vqs is 3 with the max size being 256.
> >>>>>>>>
> >>>>>>>> Since, there are 2 virtio net devices, vhost_vdpa_svqs_start [1]
> >>>>>>>> is called twice. For each of them. it calls vhost_svq_start [2]
> >>>>>>>> v->shadow_vqs->len number of times.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Ok I understand this confusion, as the code is not intuitive :). Take
> >>>>>>> into account you can only have svq in vdpa devices, so both
> >>>>>>> vhost_vdpa_svqs_start are acting on the vdpa device.
> >>>>>>>
> >>>>>>> You are seeing two calls to vhost_vdpa_svqs_start because virtio (and
> >>>>>>> vdpa) devices are modelled internally as two devices in QEMU: One for
> >>>>>>> the dataplane vq, and other for the control vq. There are historical
> >>>>>>> reasons for this, but we use it in vdpa to always shadow the CVQ while
> >>>>>>> leaving dataplane passthrough if x-svq=off and the virtio & virtio-net
> >>>>>>> feature set is understood by SVQ.
> >>>>>>>
> >>>>>>> If you break at vhost_vdpa_svqs_start with gdb and go higher in the
> >>>>>>> stack you should reach vhost_net_start, that starts each vhost_net
> >>>>>>> device individually.
> >>>>>>>
> >>>>>>> To be 100% honest, each dataplain *queue pair* (rx+tx) is modelled
> >>>>>>> with a different vhost_net device in QEMU, but you don't need to take
> >>>>>>> that into account implementing the packed vq :).
> >>>>>>
> >>>>>> Got it, this makes sense now.
> >>>>>>
> >>>>>>>> Printing the values of dev->vdev->name, v->shadow_vqs->len and
> >>>>>>>> svq->vring.num in vhost_vdpa_svqs_start gives:
> >>>>>>>>
> >>>>>>>> name: virtio-net
> >>>>>>>> len: 2
> >>>>>>>> num: 256
> >>>>>>>> num: 256
> >>>>>>>
> >>>>>>> First QEMU's vhost_net device, the dataplane.
> >>>>>>>
> >>>>>>>> name: virtio-net
> >>>>>>>> len: 1
> >>>>>>>> num: 64
> >>>>>>>>
> >>>>>>>
> >>>>>>> Second QEMU's vhost_net device, the control virtqueue.
> >>>>>>
> >>>>>> Ok, if I understand this correctly, the control vq doesn't
> >>>>>> need separate queues for rx and tx.
> >>>>>>
> >>>>>
> >>>>> That's right. Since CVQ has one reply per command, the driver can just
> >>>>> send ro+rw descriptors to the device. In the case of RX, the device
> >>>>> needs a queue with only-writable descriptors, as neither the device or
> >>>>> the driver knows how many packets will arrive.
> >>>>
> >>>> Got it, this makes sense now.
> >>>>
> >>>>>>>> I am not sure how to match the above log lines to the
> >>>>>>>> right virtio-net device since the actual value of num
> >>>>>>>> can be less than "max_vq_size" in the output of "vdpa
> >>>>>>>> dev show".
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes, the device can set a different vq max per vq, and the driver can
> >>>>>>> negotiate a lower vq size per vq too.
> >>>>>>>
> >>>>>>>> I think the first 3 log lines correspond to the virtio
> >>>>>>>> net device that I am using for testing since it has
> >>>>>>>> 2 vqs (rx and tx) while the other virtio-net device
> >>>>>>>> only has one vq.
> >>>>>>>>
> >>>>>>>> When printing out the values of svq->vring.num,
> >>>>>>>> used_elem.len and used_elem.id in vhost_svq_get_buf,
> >>>>>>>> there are two sets of output. One set corresponds to
> >>>>>>>> svq->vring.num = 64 and the other corresponds to
> >>>>>>>> svq->vring.num = 256.
> >>>>>>>>
> >>>>>>>> For svq->vring.num = 64, only the following line
> >>>>>>>> is printed repeatedly:
> >>>>>>>>
> >>>>>>>> size: 64, len: 1, i: 0
> >>>>>>>>
> >>>>>>>
> >>>>>>> This is with packed=off, right? If this is testing with packed, you
> >>>>>>> need to change the code to accommodate it. Let me know if you need
> >>>>>>> more help with this.
> >>>>>>
> >>>>>> Yes, this is for packed=off. For the time being, I am trying to
> >>>>>> get L2 to communicate with L0 using split virtqueues and x-svq=true.
> >>>>>>
> >>>>>
> >>>>> Got it.
> >>>>>
> >>>>>>> In the CVQ the only reply is a byte, indicating if the command was
> >>>>>>> applied or not. This seems ok to me.
> >>>>>>
> >>>>>> Understood.
> >>>>>>
> >>>>>>> The queue can also recycle ids as long as they are not available, so
> >>>>>>> that part seems correct to me too.
> >>>>>>
> >>>>>> I am a little confused here. The ids are recycled when they are
> >>>>>> available (i.e., the id is not already in use), right?
> >>>>>>
> >>>>>
> >>>>> In virtio, available is that the device can use them. And used is that
> >>>>> the device returned to the driver. I think you're aligned it's just it
> >>>>> is better to follow the virtio nomenclature :).
> >>>>
> >>>> Got it.
> >>>>
> >>>>>>>> For svq->vring.num = 256, the following line is
> >>>>>>>> printed 20 times,
> >>>>>>>>
> >>>>>>>> size: 256, len: 0, i: 0
> >>>>>>>>
> >>>>>>>> followed by:
> >>>>>>>>
> >>>>>>>> size: 256, len: 0, i: 1
> >>>>>>>> size: 256, len: 0, i: 1
> >>>>>>>>
> >>>>>>>
> >>>>>>> This makes sense for the tx queue too. Can you print the VirtQueue
> >>>>>>> index?
> >>>>>>
> >>>>>> For svq->vring.num = 64, the vq index is 2. So the following line
> >>>>>> (svq->vring.num, used_elem.len, used_elem.id, svq->vq->queue_index)
> >>>>>> is printed repeatedly:
> >>>>>>
> >>>>>> size: 64, len: 1, i: 0, vq idx: 2
> >>>>>>
> >>>>>> For svq->vring.num = 256, the following line is repeated several
> >>>>>> times:
> >>>>>>
> >>>>>> size: 256, len: 0, i: 0, vq idx: 1
> >>>>>>
> >>>>>> This is followed by:
> >>>>>>
> >>>>>> size: 256, len: 0, i: 1, vq idx: 1
> >>>>>>
> >>>>>> In both cases, queue_index is 1. To get the value of queue_index,
> >>>>>> I used "virtio_get_queue_index(svq->vq)" [2].
> >>>>>>
> >>>>>> Since the queue_index is 1, I guess this means this is the tx queue
> >>>>>> and the value of len (0) is correct. However, nothing with
> >>>>>> queue_index % 2 == 0 is printed by vhost_svq_get_buf() which means
> >>>>>> the device is not sending anything to the guest. Is this correct?
> >>>>>>
> >>>>>
> >>>>> Yes, that's totally correct.
> >>>>>
> >>>>> You can set -netdev tap,...,vhost=off in L0 qemu and trace (or debug
> >>>>> with gdb) it to check what is receiving. You should see calls to
> >>>>> hw/net/virtio-net.c:virtio_net_flush_tx. The corresponding function to
> >>>>> receive is virtio_net_receive_rcu, I recommend you trace too just it
> >>>>> in case you see any strange call to it.
> >>>>>
> >>>>
> >>>> I added "vhost=off" to -netdev tap in L0's qemu command. I followed all
> >>>> the steps in the blog [1] up till the point where L2 is booted. Before
> >>>> booting L2, I had no issues pinging L0 from L1.
> >>>>
> >>>> For each ping, the following trace lines were printed by QEMU:
> >>>>
> >>>> virtqueue_alloc_element elem 0x5d041024f560 size 56 in_num 0 out_num 1
> >>>> virtqueue_pop vq 0x5d04109b0ce8 elem 0x5d041024f560 in_num 0 out_num 1
> >>>> virtqueue_fill vq 0x5d04109b0ce8 elem 0x5d041024f560 len 0 idx 0
> >>>> virtqueue_flush vq 0x5d04109b0ce8 count 1
> >>>> virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0ce8
> >>>> virtqueue_alloc_element elem 0x5d041024f560 size 56 in_num 1 out_num 0
> >>>> virtqueue_pop vq 0x5d04109b0c50 elem 0x5d041024f560 in_num 1 out_num 0
> >>>> virtqueue_fill vq 0x5d04109b0c50 elem 0x5d041024f560 len 110 idx 0
> >>>> virtqueue_flush vq 0x5d04109b0c50 count 1
> >>>> virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0c50
> >>>>
> >>>> The first 5 lines look like they were printed when an echo request was
> >>>> sent to L0 and the next 5 lines were printed when an echo reply was
> >>>> received.
> >>>>
> >>>> After booting L2, I set up the tap device's IP address in L0 and the
> >>>> vDPA port's IP address in L2.
> >>>>
> >>>> When trying to ping L0 from L2, I only see the following lines being
> >>>> printed:
> >>>>
> >>>> virtqueue_alloc_element elem 0x5d041099ffd0 size 56 in_num 0 out_num 1
> >>>> virtqueue_pop vq 0x5d0410d87168 elem 0x5d041099ffd0 in_num 0 out_num 1
> >>>> virtqueue_fill vq 0x5d0410d87168 elem 0x5d041099ffd0 len 0 idx 0
> >>>> virtqueue_flush vq 0x5d0410d87168 count 1
> >>>> virtio_notify vdev 0x5d0410d79a10 vq 0x5d0410d87168
> >>>>
> >>>> There's no reception. I used wireshark to inspect the packets that are
> >>>> being sent and received through the tap device in L0.
> >>>>
> >>>> When pinging L0 from L2, I see one of the following two outcomes:
> >>>>
> >>>> Outcome 1:
> >>>> ----------
> >>>> L2 broadcasts ARP packets and L0 replies to L2.
> >>>>
> >>>> Source Destination Protocol Length Info
> >>>> 52:54:00:12:34:57 Broadcast ARP 42 Who has
> >>>> 111.1.1.1? Tell 111.1.1.2
> >>>> d2:6d:b9:61:e1:9a 52:54:00:12:34:57 ARP 42 111.1.1.1 is
> >>>> at d2:6d:b9:61:e1:9a
> >>>>
> >>>> Outcome 2 (less frequent):
> >>>> --------------------------
> >>>> L2 sends an ICMP echo request packet to L0 and L0 sends a reply,
> >>>> but the reply is not received by L2.
> >>>>
> >>>> Source Destination Protocol Length Info
> >>>> 111.1.1.2 111.1.1.1 ICMP 98 Echo (ping)
> >>>> request id=0x0006, seq=1/256, ttl=64
> >>>> 111.1.1.1 111.1.1.2 ICMP 98 Echo (ping)
> >>>> reply id=0x0006, seq=1/256, ttl=64
> >>>>
> >>>> When pinging L2 from L0 I get the following output in
> >>>> wireshark:
> >>>>
> >>>> Source Destination Protocol Length Info
> >>>> 111.1.1.1 111.1.1.2 ICMP 100 Echo (ping)
> >>>> request id=0x002c, seq=2/512, ttl=64 (no response found!)
> >>>>
> >>>> I do see a lot of traced lines being printed (by the QEMU instance that
> >>>> was started in L0) with in_num > 1, for example:
> >>>>
> >>>> virtqueue_alloc_element elem 0x5d040fdbad30 size 56 in_num 1 out_num 0
> >>>> virtqueue_pop vq 0x5d04109b0c50 elem 0x5d040fdbad30 in_num 1 out_num 0
> >>>> virtqueue_fill vq 0x5d04109b0c50 elem 0x5d040fdbad30 len 76 idx 0
> >>>> virtqueue_flush vq 0x5d04109b0c50 count 1
> >>>> virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0c50
> >>>>
> >>>
> >>> So L0 is able to receive data from L2. We're halfway there, Good! :).
> >>>
> >>>> It looks like L1 is receiving data from L0 but this is not related to
> >>>> the pings that are sent from L2. I haven't figured out what data is
> >>>> actually being transferred in this case. It's not necessary for all of
> >>>> the data that L1 receives from L0 to be passed to L2, is it?
> >>>>
> >>>
> >>> It should be noise, yes.
> >>>
> >>
> >> Understood.
> >>
> >>>>>>>> For svq->vring.num = 256, the following line is
> >>>>>>>> printed 20 times,
> >>>>>>>>
> >>>>>>>> size: 256, len: 0, i: 0
> >>>>>>>>
> >>>>>>>> followed by:
> >>>>>>>>
> >>>>>>>> size: 256, len: 0, i: 1
> >>>>>>>> size: 256, len: 0, i: 1
> >>>>>>>>
> >>>>>>>
> >>>>>>> This makes sense for the tx queue too. Can you print the VirtQueue
> >>>>>>> index?
> >>>>>>
> >>>>>> For svq->vring.num = 64, the vq index is 2. So the following line
> >>>>>> (svq->vring.num, used_elem.len, used_elem.id, svq->vq->queue_index)
> >>>>>> is printed repeatedly:
> >>>>>>
> >>>>>> size: 64, len: 1, i: 0, vq idx: 2
> >>>>>>
> >>>>>> For svq->vring.num = 256, the following line is repeated several
> >>>>>> times:
> >>>>>>
> >>>>>> size: 256, len: 0, i: 0, vq idx: 1
> >>>>>>
> >>>>>> This is followed by:
> >>>>>>
> >>>>>> size: 256, len: 0, i: 1, vq idx: 1
> >>>>>>
> >>>>>> In both cases, queue_index is 1.
> >>>>
> >>>> I also noticed that there are now some lines with svq->vring.num = 256
> >>>> where len > 0. These lines were printed by the QEMU instance running
> >>>> in L1, so this corresponds to data that was received by L2.
> >>>>
> >>>> svq->vring.num used_elem.len used_elem.id svq->vq->queue_index
> >>>> size: 256 len: 82 i: 0 vq idx: 0
> >>>> size: 256 len: 82 i: 1 vq idx: 0
> >>>> size: 256 len: 82 i: 2 vq idx: 0
> >>>> size: 256 len: 54 i: 3 vq idx: 0
> >>>>
> >>>> I still haven't figured out what data was received by L2 but I am
> >>>> slightly confused as to why this data was received by L2 but not
> >>>> the ICMP echo replies sent by L0.
> >>>>
> >>>
> >>> We're on a good track, let's trace it deeper. I guess these are
> >>> printed from vhost_svq_flush, right? Do virtqueue_fill,
> >>> virtqueue_flush, and event_notifier_set(&svq->svq_call) run properly,
> >>> or do you see anything strange with gdb / tracing?
> >>>
> >>
> >> Apologies for the delay in replying. It took me a while to figure
> >> this out, but I have now understood why this doesn't work. L1 is
> >> unable to receive messages from L0 because they get filtered out
> >> by hw/net/virtio-net.c:receive_filter [1]. There's an issue with
> >> the MAC addresses.
> >>
> >> In L0, I have:
> >>
> >> $ ip a show tap0
> >> 6: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
> >> UNKNOWN group default qlen 1000
> >> link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff
> >> inet 111.1.1.1/24 scope global tap0
> >> valid_lft forever preferred_lft forever
> >> inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll
> >> valid_lft forever preferred_lft forever
> >>
> >> In L1:
> >>
> >> # ip a show eth0
> >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
> >> UP group default qlen 1000
> >> link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
> >> altname enp0s2
> >> inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute
> >> eth0
> >> valid_lft 83455sec preferred_lft 83455sec
> >> inet6 fec0::7bd2:265e:3b8e:5acc/64 scope site dynamic noprefixroute
> >> valid_lft 86064sec preferred_lft 14064sec
> >> inet6 fe80::50e7:5bf6:fff8:a7b0/64 scope link noprefixroute
> >> valid_lft forever preferred_lft forever
> >>
> >> I'll call this L1-eth0.
> >>
> >> In L2:
> >> # ip a show eth0
> >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
> >> UP gro0
> >> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
> >> altname enp0s7
> >> inet 111.1.1.2/24 scope global eth0
> >> valid_lft forever preferred_lft forever
> >>
> >> I'll call this L2-eth0.
> >>
> >> Apart from eth0, lo is the only other device in both L1 and L2.
> >>
> >> A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57)
> >> as its destination address. When booting L2 with x-svq=false, the
> >> value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts
> >> the frames and passes them on to L2 and pinging works [2].
> >>
> >
> > So this behavior is interesting by itself. But L1's kernel net system
> > should not receive anything. As I read it, even if it receives it, it
> > should not forward the frame to L2 as it is in a different subnet. Are
> > you able to read it using tcpdump on L1?
>
> I ran "tcpdump -i eth0" in L1. It didn't capture any of the packets
> that were directed at L2 even though L2 was able to receive them.
> Similarly, it didn't capture any packets that were sent from L2 to
> L0. This is when L2 is launched with x-svq=false.
>
That's right. The virtio dataplane goes directly from L0 to L2, you
should not be able to see any packets in the net of L1.
> With x-svq=true, forcibly setting the LSB of n->mac to 0x57 in
> receive_filter allows L2 to receive packets from L0. I added
> the following line just before line 1771 [1] to check this out.
>
> n->mac[5] = 0x57;
>
That's very interesting. Let me answer all the gdb questions below and
we can debug it deeper :).
> > Maybe we can make the scenario clearer by telling which virtio-net
> > device is which with virtio_net_pci,mac=XX:... ?
> >
> >> However, when booting L2 with x-svq=true, n->mac is set to L1-eth0
> >> (LSB = 56) in virtio_net_handle_mac() [3].
> >
> > Can you tell with gdb bt if this function is called from net or the
> > SVQ subsystem?
>
> I am struggling to learn how one uses gdb to debug QEMU. I tried running
> QEMU in L0 with -s and -S in one terminal. In another terminal, I ran
> the following:
>
The option -s of QEMU make it act as a debugger for the *guest
kernel*. It's interesting for debugging the virtio_net driver in the
nested guest, for example, but SVQ lives in the nested QEMU, in
userspace. So you don't need to use it here.
> $ gdb ./build/qemu-system-x86_64
>
> I then ran the following in gdb's console, but stepping through or
> continuing the execution gives me errors:
>
> (gdb) target remote localhost:1234
> (gdb) break -source ../hw/net/virtio-net.c -function receive_filter
> (gdb) c
> Continuing.
> Warning:
> Cannot insert breakpoint 2.
> Cannot access memory at address 0x9058c6
>
> Command aborted.
> (gdb) ni
> Continuing.
> Warning:
> Cannot insert breakpoint 2.
> Cannot access memory at address 0x9058c6
>
> Command aborted.
>
Yes, you're trying to debug the kernel running in the guest with the
QEMU's sources so it is not possible to correlate functions, symbols,
etc :).
To run QEMU with gdb, just add gdb --args in front of the qemu invocation.
> I built QEMU using ./configure --enable-debug.
>
> I also tried using the --disable-pie option but this results
> in a build error.
>
> [8063/8844] Linking target qemu-keymap
> FAILED: qemu-keymap
> cc -m64 -o qemu-keymap <...>
> /usr/bin/ld: libevent-loop-base.a.p/event-loop-base.c.o: relocation
> R_X86_64_32 against `.rodata' can not be used when making a PIE object;
> recompile with -fPIE
> /usr/bin/ld: failed to set dynamic section sizes: bad value
> collect2: error: ld returned 1 exit status
>
> >> n->mac_table.macs also
> >> does not seem to have L2-eth0's MAC address. Due to this,
> >> receive_filter() filters out all the frames [4] that were meant for
> >> L2-eth0.
> >>
> >
> > In the vp_vdpa scenario of the blog receive_filter should not be
> > called in the qemu running in the L1 guest, the nested one. Can you
> > check it with gdb or by printing a trace if it is called?
>
> This is right. receive_filter is not called in L1's QEMU with
> x-svq=true.
>
> >> With x-svq=true, I see that n->mac is set by virtio_net_handle_mac()
> >> [3] when L1 receives VIRTIO_NET_CTRL_MAC_ADDR_SET. With x-svq=false,
> >> virtio_net_handle_mac() doesn't seem to be getting called. I haven't
> >> understood how the MAC address is set in VirtIONet when x-svq=false.
> >> Understanding this might help see why n->mac has different values
> >> when x-svq is false vs when it is true.
> >>
> >
> > Ok this makes sense, as x-svq=true is the one that receives the set
> > mac message. You should see it in L0's QEMU though, both in x-svq=on
> > and x-svq=off scenarios. Can you check it?
>
> L0's QEMU seems to be receiving the "set mac" message only when L1
> is launched with x-svq=true. With x-svq=off, I don't see any call
> to virtio_net_handle_mac with cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET
> in L0.
>
Ok this is interesting. Let's disable control virtqueue to start with
something simpler:
device virtio-net-pci,netdev=net0,ctrl_vq=off,...
QEMU will start complaining about features that depend on ctrl_vq,
like ctrl_rx. Let's disable all of them and check this new scenario.
Thanks!