qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue


From: Sahil Siddiq
Subject: Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue
Date: Fri, 3 Jan 2025 18:36:01 +0530
User-agent: Mozilla Thunderbird

Hi,

On 12/20/24 12:28 PM, Eugenio Perez Martin wrote:
On Thu, Dec 19, 2024 at 8:37 PM Sahil Siddiq <icegambit91@gmail.com> wrote:

Hi,

On 12/17/24 1:20 PM, Eugenio Perez Martin wrote:
On Tue, Dec 17, 2024 at 6:45 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
On 12/16/24 2:09 PM, Eugenio Perez Martin wrote:
On Sun, Dec 15, 2024 at 6:27 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
On 12/10/24 2:57 PM, Eugenio Perez Martin wrote:
On Thu, Dec 5, 2024 at 9:34 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
[...]
I have been following the "Hands on vDPA: what do you do
when you ain't got the hardware v2 (Part 2)" [1] blog to
test my changes. To boot the L1 VM, I ran:

sudo ./qemu/build/qemu-system-x86_64 \
-enable-kvm \
-drive 
file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \
-net nic,model=virtio \
-net user,hostfwd=tcp::2222-:22 \
-device intel-iommu,snoop-control=on \
-device 
virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,packed=on,event_idx=off,bus=pcie.0,addr=0x4
 \
-netdev tap,id=net0,script=no,downscript=no \
-nographic \
-m 8G \
-smp 4 \
-M q35 \
-cpu host 2>&1 | tee vm.log

Without "guest_uso4=off,guest_uso6=off,host_uso=off,
guest_announce=off" in "-device virtio-net-pci", QEMU
throws "vdpa svq does not work with features" [2] when
trying to boot L2.

The enums added in commit #2 in this series is new and
wasn't in the earlier versions of the series. Without
this change, x-svq=true throws "SVQ invalid device feature
flags" [3] and x-svq is consequently disabled.

The first issue is related to running traffic in L2
with vhost-vdpa.

In L0:

$ ip addr add 111.1.1.1/24 dev tap0
$ ip link set tap0 up
$ ip addr show tap0
4: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state 
UNKNOWN group default qlen 1000
        link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff
        inet 111.1.1.1/24 scope global tap0
           valid_lft forever preferred_lft forever
        inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll
           valid_lft forever preferred_lft forever

I am able to run traffic in L2 when booting without
x-svq.

In L1:

$ ./qemu/build/qemu-system-x86_64 \
-nographic \
-m 4G \
-enable-kvm \
-M q35 \
-drive file=//root/L2.qcow2,media=disk,if=virtio \
-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0 \
-device 
virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7
 \
-smp 4 \
-cpu host \
2>&1 | tee vm.log

In L2:

# ip addr add 111.1.1.2/24 dev eth0
# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP 
group default qlen 1000
        link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
        altname enp0s7
        inet 111.1.1.2/24 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::9877:de30:5f17:35f9/64 scope link noprefixroute
           valid_lft forever preferred_lft forever

# ip route
111.1.1.0/24 dev eth0 proto kernel scope link src 111.1.1.2

# ping 111.1.1.1 -w3
PING 111.1.1.1 (111.1.1.1) 56(84) bytes of data.
64 bytes from 111.1.1.1: icmp_seq=1 ttl=64 time=0.407 ms
64 bytes from 111.1.1.1: icmp_seq=2 ttl=64 time=0.671 ms
64 bytes from 111.1.1.1: icmp_seq=3 ttl=64 time=0.291 ms

--- 111.1.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2034ms
rtt min/avg/max/mdev = 0.291/0.456/0.671/0.159 ms


But if I boot L2 with x-svq=true as shown below, I am unable
to ping the host machine.

$ ./qemu/build/qemu-system-x86_64 \
-nographic \
-m 4G \
-enable-kvm \
-M q35 \
-drive file=//root/L2.qcow2,media=disk,if=virtio \
-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,x-svq=true,id=vhost-vdpa0 \
-device 
virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7
 \
-smp 4 \
-cpu host \
2>&1 | tee vm.log

In L2:

# ip addr add 111.1.1.2/24 dev eth0
# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP 
group default qlen 1000
        link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
        altname enp0s7
        inet 111.1.1.2/24 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::9877:de30:5f17:35f9/64 scope link noprefixroute
           valid_lft forever preferred_lft forever

# ip route
111.1.1.0/24 dev eth0 proto kernel scope link src 111.1.1.2

# ping 111.1.1.1 -w10
PING 111.1.1.1 (111.1.1.1) 56(84) bytes of data.
    From 111.1.1.2 icmp_seq=1 Destination Host Unreachable
ping: sendmsg: No route to host
    From 111.1.1.2 icmp_seq=2 Destination Host Unreachable
    From 111.1.1.2 icmp_seq=3 Destination Host Unreachable

--- 111.1.1.1 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2076ms
pipe 3

The other issue is related to booting L2 with "x-svq=true"
and "packed=on".

In L1:

$ ./qemu/build/qemu-system-x86_64 \
-nographic \
-m 4G \
-enable-kvm \
-M q35 \
-drive file=//root/L2.qcow2,media=disk,if=virtio \
-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true \
-device 
virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,packed=on,bus=pcie.0,addr=0x7
 \
-smp 4 \
-cpu host \
2>&1 | tee vm.log

The kernel throws "virtio_net virtio1: output.0:id 0 is not
a head!" [4].


So this series implements the descriptor forwarding from the guest to
the device in packed vq. We also need to forward the descriptors from
the device to the guest. The device writes them in the SVQ ring.

The functions responsible for that in QEMU are
hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_flush, which is called by
the device when used descriptors are written to the SVQ, which calls
hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf. We need to do
modifications similar to vhost_svq_add: Make them conditional if we're
in split or packed vq, and "copy" the code from Linux's
drivers/virtio/virtio_ring.c:virtqueue_get_buf.

After these modifications you should be able to ping and forward
traffic. As always, It is totally ok if it needs more than one
iteration, and feel free to ask any question you have :).


I misunderstood this part. While working on extending
hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf() [1]
for packed vqs, I realized that this function and
vhost_svq_flush() already support split vqs. However, I am
unable to ping L0 when booting L2 with "x-svq=true" and
"packed=off" or when the "packed" option is not specified
in QEMU's command line.

I tried debugging these functions for split vqs after running
the following QEMU commands while following the blog [2].

Booting L1:

$ sudo ./qemu/build/qemu-system-x86_64 \
-enable-kvm \
-drive 
file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \
-net nic,model=virtio \
-net user,hostfwd=tcp::2222-:22 \
-device intel-iommu,snoop-control=on \
-device 
virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,packed=off,event_idx=off,bus=pcie.0,addr=0x4
 \
-netdev tap,id=net0,script=no,downscript=no \
-nographic \
-m 8G \
-smp 4 \
-M q35 \
-cpu host 2>&1 | tee vm.log

Booting L2:

# ./qemu/build/qemu-system-x86_64 \
-nographic \
-m 4G \
-enable-kvm \
-M q35 \
-drive file=//root/L2.qcow2,media=disk,if=virtio \
-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,x-svq=true,id=vhost-vdpa0 \
-device 
virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7
 \
-smp 4 \
-cpu host \
2>&1 | tee vm.log

I printed out the contents of VirtQueueElement returned
by vhost_svq_get_buf() in vhost_svq_flush() [3].
I noticed that "len" which is set by "vhost_svq_get_buf"
is always set to 0 while VirtQueueElement.len is non-zero.
I haven't understood the difference between these two "len"s.


VirtQueueElement.len is the length of the buffer, while the len of
vhost_svq_get_buf is the bytes written by the device. In the case of
the tx queue, VirtQueuelen is the length of the tx packet, and the
vhost_svq_get_buf is always 0 as the device does not write. In the
case of rx, VirtQueueElem.len is the available length for a rx frame,
and the vhost_svq_get_buf len is the actual length written by the
device.

To be 100% accurate a rx packet can span over multiple buffers, but
SVQ does not need special code to handle this.

So vhost_svq_get_buf should return > 0 for rx queue (svq->vq->index ==
0), and 0 for tx queue (svq->vq->index % 2 == 1).

Take into account that vhost_svq_get_buf only handles split vq at the
moment! It should be renamed or splitted into vhost_svq_get_buf_split.

In L1, there are 2 virtio network devices.

# lspci -nn | grep -i net
00:02.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device 
[1af4:1000]
00:04.0 Ethernet controller [0200]: Red Hat, Inc. Virtio 1.0 network device 
[1af4:1041] (rev 01)

I am using the second one (1af4:1041) for testing my changes and have
bound this device to the vp_vdpa driver.

# vdpa dev show -jp
{
       "dev": {
           "vdpa0": {
               "type": "network",
               "mgmtdev": "pci/0000:00:04.0",
               "vendor_id": 6900,
               "max_vqs": 3,

How is max_vqs=3? For this to happen L0 QEMU should have
virtio-net-pci,...,queues=3 cmdline argument.

Ouch! I totally misread it :(. Everything is correct, max_vqs should
be 3. I read it as the virtio_net queues, which means queue *pairs*,
as it includes rx and tx queue.

Understood :)


I am not sure why max_vqs is 3. I haven't set the value of queues to 3
in the cmdline argument. Is max_vqs expected to have a default value
other than 3?

In the blog [1] as well, max_vqs is 3 even though there's no queues=3
argument.

It's clear the guest is not using them, we can add mq=off
to simplify the scenario.

The value of max_vqs is still 3 after adding mq=off. The whole
command that I run to boot L0 is:

$ sudo ./qemu/build/qemu-system-x86_64 \
-enable-kvm \
-drive 
file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \
-net nic,model=virtio \
-net user,hostfwd=tcp::2222-:22 \
-device intel-iommu,snoop-control=on \
-device 
virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,mq=off,ctrl_vq=on,ctrl_rx=on,packed=off,event_idx=off,bus=pcie.0,addr=0x4
 \
-netdev tap,id=net0,script=no,downscript=no \
-nographic \
-m 8G \
-smp 4 \
-M q35 \
-cpu host 2>&1 | tee vm.log

Could it be that 2 of the 3 vqs are used for the dataplane and
the third vq is the control vq?

               "max_vq_size": 256
           }
       }
}

The max number of vqs is 3 with the max size being 256.

Since, there are 2 virtio net devices, vhost_vdpa_svqs_start [1]
is called twice. For each of them. it calls vhost_svq_start [2]
v->shadow_vqs->len number of times.


Ok I understand this confusion, as the code is not intuitive :). Take
into account you can only have svq in vdpa devices, so both
vhost_vdpa_svqs_start are acting on the vdpa device.

You are seeing two calls to vhost_vdpa_svqs_start because virtio (and
vdpa) devices are modelled internally as two devices in QEMU: One for
the dataplane vq, and other for the control vq. There are historical
reasons for this, but we use it in vdpa to always shadow the CVQ while
leaving dataplane passthrough if x-svq=off and the virtio & virtio-net
feature set is understood by SVQ.

If you break at vhost_vdpa_svqs_start with gdb and go higher in the
stack you should reach vhost_net_start, that starts each vhost_net
device individually.

To be 100% honest, each dataplain *queue pair* (rx+tx) is modelled
with a different vhost_net device in QEMU, but you don't need to take
that into account implementing the packed vq :).

Got it, this makes sense now.

Printing the values of dev->vdev->name, v->shadow_vqs->len and
svq->vring.num in vhost_vdpa_svqs_start gives:

name: virtio-net
len: 2
num: 256
num: 256

First QEMU's vhost_net device, the dataplane.

name: virtio-net
len: 1
num: 64


Second QEMU's vhost_net device, the control virtqueue.

Ok, if I understand this correctly, the control vq doesn't
need separate queues for rx and tx.


That's right. Since CVQ has one reply per command, the driver can just
send ro+rw descriptors to the device. In the case of RX, the device
needs a queue with only-writable descriptors, as neither the device or
the driver knows how many packets will arrive.

Got it, this makes sense now.

I am not sure how to match the above log lines to the
right virtio-net device since the actual value of num
can be less than "max_vq_size" in the output of "vdpa
dev show".


Yes, the device can set a different vq max per vq, and the driver can
negotiate a lower vq size per vq too.

I think the first 3 log lines correspond to the virtio
net device that I am using for testing since it has
2 vqs (rx and tx) while the other virtio-net device
only has one vq.

When printing out the values of svq->vring.num,
used_elem.len and used_elem.id in vhost_svq_get_buf,
there are two sets of output. One set corresponds to
svq->vring.num = 64 and the other corresponds to
svq->vring.num = 256.

For svq->vring.num = 64, only the following line
is printed repeatedly:

size: 64, len: 1, i: 0


This is with packed=off, right? If this is testing with packed, you
need to change the code to accommodate it. Let me know if you need
more help with this.

Yes, this is for packed=off. For the time being, I am trying to
get L2 to communicate with L0 using split virtqueues and x-svq=true.


Got it.

In the CVQ the only reply is a byte, indicating if the command was
applied or not. This seems ok to me.

Understood.

The queue can also recycle ids as long as they are not available, so
that part seems correct to me too.

I am a little confused here. The ids are recycled when they are
available (i.e., the id is not already in use), right?


In virtio, available is that the device can use them. And used is that
the device returned to the driver. I think you're aligned it's just it
is better to follow the virtio nomenclature :).

Got it.

For svq->vring.num = 256, the following line is
printed 20 times,

size: 256, len: 0, i: 0

followed by:

size: 256, len: 0, i: 1
size: 256, len: 0, i: 1


This makes sense for the tx queue too. Can you print the VirtQueue index?

For svq->vring.num = 64, the vq index is 2. So the following line
(svq->vring.num, used_elem.len, used_elem.id, svq->vq->queue_index)
is printed repeatedly:

size: 64, len: 1, i: 0, vq idx: 2

For svq->vring.num = 256, the following line is repeated several
times:

size: 256, len: 0, i: 0, vq idx: 1

This is followed by:

size: 256, len: 0, i: 1, vq idx: 1

In both cases, queue_index is 1. To get the value of queue_index,
I used "virtio_get_queue_index(svq->vq)" [2].

Since the queue_index is 1, I guess this means this is the tx queue
and the value of len (0) is correct. However, nothing with
queue_index % 2 == 0 is printed by vhost_svq_get_buf() which means
the device is not sending anything to the guest. Is this correct?


Yes, that's totally correct.

You can set -netdev tap,...,vhost=off in L0 qemu and trace (or debug
with gdb) it to check what is receiving. You should see calls to
hw/net/virtio-net.c:virtio_net_flush_tx. The corresponding function to
receive is virtio_net_receive_rcu, I recommend you trace too just it
in case you see any strange call to it.


I added "vhost=off" to -netdev tap in L0's qemu command. I followed all
the steps in the blog [1] up till the point where L2 is booted. Before
booting L2, I had no issues pinging L0 from L1.

For each ping, the following trace lines were printed by QEMU:

virtqueue_alloc_element elem 0x5d041024f560 size 56 in_num 0 out_num 1
virtqueue_pop vq 0x5d04109b0ce8 elem 0x5d041024f560 in_num 0 out_num 1
virtqueue_fill vq 0x5d04109b0ce8 elem 0x5d041024f560 len 0 idx 0
virtqueue_flush vq 0x5d04109b0ce8 count 1
virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0ce8
virtqueue_alloc_element elem 0x5d041024f560 size 56 in_num 1 out_num 0
virtqueue_pop vq 0x5d04109b0c50 elem 0x5d041024f560 in_num 1 out_num 0
virtqueue_fill vq 0x5d04109b0c50 elem 0x5d041024f560 len 110 idx 0
virtqueue_flush vq 0x5d04109b0c50 count 1
virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0c50

The first 5 lines look like they were printed when an echo request was
sent to L0 and the next 5 lines were printed when an echo reply was
received.

After booting L2, I set up the tap device's IP address in L0 and the
vDPA port's IP address in L2.

When trying to ping L0 from L2, I only see the following lines being
printed:

virtqueue_alloc_element elem 0x5d041099ffd0 size 56 in_num 0 out_num 1
virtqueue_pop vq 0x5d0410d87168 elem 0x5d041099ffd0 in_num 0 out_num 1
virtqueue_fill vq 0x5d0410d87168 elem 0x5d041099ffd0 len 0 idx 0
virtqueue_flush vq 0x5d0410d87168 count 1
virtio_notify vdev 0x5d0410d79a10 vq 0x5d0410d87168

There's no reception. I used wireshark to inspect the packets that are
being sent and received through the tap device in L0.

When pinging L0 from L2, I see one of the following two outcomes:

Outcome 1:
----------
L2 broadcasts ARP packets and L0 replies to L2.

Source             Destination        Protocol    Length    Info
52:54:00:12:34:57  Broadcast          ARP         42        Who has 111.1.1.1? 
Tell 111.1.1.2
d2:6d:b9:61:e1:9a  52:54:00:12:34:57  ARP         42        111.1.1.1 is at 
d2:6d:b9:61:e1:9a

Outcome 2 (less frequent):
--------------------------
L2 sends an ICMP echo request packet to L0 and L0 sends a reply,
but the reply is not received by L2.

Source             Destination        Protocol    Length    Info
111.1.1.2          111.1.1.1          ICMP        98        Echo (ping) request 
 id=0x0006, seq=1/256, ttl=64
111.1.1.1          111.1.1.2          ICMP        98        Echo (ping) reply   
 id=0x0006, seq=1/256, ttl=64

When pinging L2 from L0 I get the following output in
wireshark:

Source             Destination        Protocol    Length    Info
111.1.1.1          111.1.1.2          ICMP        100       Echo (ping) request 
 id=0x002c, seq=2/512, ttl=64 (no response found!)

I do see a lot of traced lines being printed (by the QEMU instance that
was started in L0) with in_num > 1, for example:

virtqueue_alloc_element elem 0x5d040fdbad30 size 56 in_num 1 out_num 0
virtqueue_pop vq 0x5d04109b0c50 elem 0x5d040fdbad30 in_num 1 out_num 0
virtqueue_fill vq 0x5d04109b0c50 elem 0x5d040fdbad30 len 76 idx 0
virtqueue_flush vq 0x5d04109b0c50 count 1
virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0c50

It looks like L1 is receiving data from L0 but this is not related to
the pings that are sent from L2. I haven't figured out what data is
actually being transferred in this case. It's not necessary for all of
the data that L1 receives from L0 to be passed to L2, is it?

For svq->vring.num = 256, the following line is
printed 20 times,

size: 256, len: 0, i: 0

followed by:

size: 256, len: 0, i: 1
size: 256, len: 0, i: 1


This makes sense for the tx queue too. Can you print the VirtQueue index?

For svq->vring.num = 64, the vq index is 2. So the following line
(svq->vring.num, used_elem.len, used_elem.id, svq->vq->queue_index)
is printed repeatedly:

size: 64, len: 1, i: 0, vq idx: 2

For svq->vring.num = 256, the following line is repeated several
times:

size: 256, len: 0, i: 0, vq idx: 1

This is followed by:

size: 256, len: 0, i: 1, vq idx: 1

In both cases, queue_index is 1.

I also noticed that there are now some lines with svq->vring.num = 256
where len > 0. These lines were printed by the QEMU instance running
in L1, so this corresponds to data that was received by L2.

svq->vring.num  used_elem.len  used_elem.id  svq->vq->queue_index
size: 256       len: 82        i: 0          vq idx: 0
size: 256       len: 82        i: 1          vq idx: 0
size: 256       len: 82        i: 2          vq idx: 0
size: 256       len: 54        i: 3          vq idx: 0

I still haven't figured out what data was received by L2 but I am
slightly confused as to why this data was received by L2 but not
the ICMP echo replies sent by L0.

Thanks,
Sahil

[1] 
https://www.redhat.com/en/blog/hands-vdpa-what-do-you-do-when-you-aint-got-hardware-part-2



reply via email to

[Prev in Thread] Current Thread [Next in Thread]