qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 6/7] vdpa: move iova_tree allocation to net_vhost_vdpa_ini


From: Si-Wei Liu
Subject: Re: [PATCH v2 6/7] vdpa: move iova_tree allocation to net_vhost_vdpa_init
Date: Tue, 2 Apr 2024 23:53:22 -0700
User-agent: Mozilla Thunderbird



On 4/2/2024 5:01 AM, Eugenio Perez Martin wrote:
On Tue, Apr 2, 2024 at 8:19 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:


On 2/14/2024 11:11 AM, Eugenio Perez Martin wrote:
On Wed, Feb 14, 2024 at 7:29 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
Hi Michael,

On 2/13/2024 2:22 AM, Michael S. Tsirkin wrote:
On Mon, Feb 05, 2024 at 05:10:36PM -0800, Si-Wei Liu wrote:
Hi Eugenio,

I thought this new code looks good to me and the original issue I saw with
x-svq=on should be gone. However, after rebase my tree on top of this,
there's a new failure I found around setting up guest mappings at early
boot, please see attached the specific QEMU config and corresponding event
traces. Haven't checked into the detail yet, thinking you would need to be
aware of ahead.

Regards,
-Siwei
Eugenio were you able to reproduce? Siwei did you have time to
look into this?
Didn't get a chance to look into the detail yet in the past week, but
thought it may have something to do with the (internals of) iova tree
range allocation and the lookup routine. It started to fall apart at the
first vhost_vdpa_dma_unmap call showing up in the trace events, where it
should've gotten IOVA=0x2000001000,  but an incorrect IOVA address
0x1000 was ended up returning from the iova tree lookup routine.

HVA                    GPA                IOVA
-------------------------------------------------------------------------------------------------------------------------
Map
[0x7f7903e00000, 0x7f7983e00000)    [0x0, 0x80000000) [0x1000, 0x80000000)
[0x7f7983e00000, 0x7f9903e00000)    [0x100000000, 0x2080000000)
[0x80001000, 0x2000001000)
[0x7f7903ea0000, 0x7f7903ec0000)    [0xfeda0000, 0xfedc0000)
[0x2000001000, 0x2000021000)

Unmap
[0x7f7903ea0000, 0x7f7903ec0000)    [0xfeda0000, 0xfedc0000) [0x1000,
0x20000) ???
                                       shouldn't it be [0x2000001000,
0x2000021000) ???

It looks the SVQ iova tree lookup routine vhost_iova_tree_find_iova(),
which is called from vhost_vdpa_listener_region_del(), can't properly
deal with overlapped region. Specifically, q35's mch_realize() has the
following:

579     memory_region_init_alias(&mch->open_high_smram, OBJECT(mch),
"smram-open-high",
580                              mch->ram_memory,
MCH_HOST_BRIDGE_SMRAM_C_BASE,
581                              MCH_HOST_BRIDGE_SMRAM_C_SIZE);
582     memory_region_add_subregion_overlap(mch->system_memory, 0xfeda0000,
583 &mch->open_high_smram, 1);
584     memory_region_set_enabled(&mch->open_high_smram, false);

#0  0x0000564c30bf6980 in iova_tree_find_address_iterator
(key=0x564c331cf8e0, value=0x564c331cf8e0, data=0x7fffb6d749b0) at
../util/iova-tree.c:96
#1  0x00007f5f66479654 in g_tree_foreach () at /lib64/libglib-2.0.so.0
#2  0x0000564c30bf6b53 in iova_tree_find_iova (tree=<optimized out>,
map=map@entry=0x7fffb6d74a00) at ../util/iova-tree.c:114
#3  0x0000564c309da0a9 in vhost_iova_tree_find_iova (tree=<optimized
out>, map=map@entry=0x7fffb6d74a00) at ../hw/virtio/vhost-iova-tree.c:70
#4  0x0000564c3085e49d in vhost_vdpa_listener_region_del
(listener=0x564c331024c8, section=0x7fffb6d74aa0) at
../hw/virtio/vhost-vdpa.c:444
#5  0x0000564c309f4931 in address_space_update_topology_pass
(as=as@entry=0x564c31ab1840 <address_space_memory>,
old_view=old_view@entry=0x564c33364cc0,
new_view=new_view@entry=0x564c333640f0, adding=adding@entry=false) at
../system/memory.c:977
#6  0x0000564c309f4dcd in address_space_set_flatview (as=0x564c31ab1840
<address_space_memory>) at ../system/memory.c:1079
#7  0x0000564c309f86d0 in memory_region_transaction_commit () at
../system/memory.c:1132
#8  0x0000564c309f86d0 in memory_region_transaction_commit () at
../system/memory.c:1117
#9  0x0000564c307cce64 in mch_realize (d=<optimized out>,
errp=<optimized out>) at ../hw/pci-host/q35.c:584

However, it looks like iova_tree_find_address_iterator() only check if
the translated address (HVA) falls in to the range when trying to locate
the desired IOVA, causing the first DMAMap that happens to overlap in
the translated address (HVA) space to be returned prematurely:

   89 static gboolean iova_tree_find_address_iterator(gpointer key,
gpointer value,
   90                                                 gpointer data)
   91 {
   :
   :
   99     if (map->translated_addr + map->size < needle->translated_addr ||
100         needle->translated_addr + needle->size < map->translated_addr) {
101         return false;
102     }
103
104     args->result = map;
105     return true;
106 }

In the QEMU trace file, it reveals that the first DMAMap as below gets
returned incorrectly instead the second, the latter of which is what the
actual IOVA corresponds to:

HVA                                                             GPA             
                                IOVA
[0x7f7903e00000, 0x7f7983e00000)        [0x0, 0x80000000)                       
[0x1000, 0x80001000)
[0x7f7903ea0000, 0x7f7903ec0000)        [0xfeda0000, 0xfedc0000)        
[0x2000001000, 0x2000021000)

I think the analysis is totally accurate as no code expects to unmap /
map overlapping regions. In particular, vdpa kernel does not expect it
either.

Maybe I missed something, but I don't see how vdpa kernel prohibits this overlapping case? The same HVA gets remapped under a different GPA or IOVA range should be a valid use case it seems? For e.g. I don't see mapping failure when x-svq=on was removed from QEMU's vhost-vdpa arguments (where you can see in the attached trace, mch_realize with the exactly same overlapping region got remapped successfully, and then got unmapped right after).

Since it is issued at _realize, it should be ok to unmap all the
region range and then map the right range again, even if that implies
a lot of unpin / pin.
You'll find out mch_realize() already did that -  where line 582 is to create a mapping overlapped with entire range, while line 583 is to disable (unmap) this same mapping shortly after.

582     memory_region_add_subregion_overlap(mch->system_memory, 0xfeda0000,
583 &mch->open_high_smram, 1);
584     memory_region_set_enabled(&mch->open_high_smram, false);

So we don't have to add any special case to deal with these overlapping case (just use the GPA as the hint to help identify which iova range it tries to unmap), as the vdpa's memory listener now moves upfront to the earliest point of time at machine boot, the relevant SVQ code needs to be more flexible and tolerate to the extra churns for the rest of QEMU's subsystems to stabilize the guest memory layout.
Maybe other than check the HVA range, we should also match GPA, or at
least the size should exactly match?

The safe actions here would be to unmap all the memory chunk and then
map the overlap memory? Or am I missing something?

Another thing I don't get, is this reproducible in previous versions?
It is not reproducible without this series. I thought that may be because previously the vhost-vdpa's memory listener was registered at a later point of time, after all the overlapped regions are gone and the memory layout is completely stabilized. Now we move registration to the init time, then all the devils come out biting us.

Previous versions of this patch series we didn't get this far yet, when x-svq=on is specified.

https://lore.kernel.org/qemu-devel/92ecfd90-8d06-4669-b260-a7a3b106277e@oracle.com/
As far as I understand, this bug was never found before. I guess this
path of q35's mch_realize is recent?
No, it's been there for quite a while.

Cheers,
-Siwei

Thanks!

Yes, I'm still not able to reproduce. In particular, I don't know how
how the memory listener adds a region and then release a region with a
different size. I'm talking about these log entries:

1706854838.154394:vhost_vdpa_listener_region_add vdpa: 0x556d45c75140
iova 0x0 llend 0x80000000 vaddr: 0x7f7903e00000 read-only: 0
452:vhost_vdpa_listener_region_del vdpa: 0x556d45c75140 iova 0x0 llend
0x7fffffff
Didn't see a different size here, though if you referred to the
discrepancy in the traces around llend, I thought the two between _add()
and _del() would have to be interpreted differently due to:

3d1e4d34 "vhost_vdpa: fix the input in
trace_vhost_vdpa_listener_region_del()"

Regards,
-Siwei
Is it possible for you to also trace the skipped regions? We should
add a debug trace there too...

Thanks!

PS, I will be taking off from today and for the next two weeks. Will try
to help out looking more closely after I get back.

-Siwei
    Can't merge patches which are known to break things ...

Attachment: qtrace-success-no-x_svq_on.out
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]