[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-mem
From: |
Chenyi Qiang |
Subject: |
Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager |
Date: |
Fri, 10 Jan 2025 14:38:04 +0800 |
User-agent: |
Mozilla Thunderbird |
On 1/10/2025 8:58 AM, Alexey Kardashevskiy wrote:
>
>
> On 9/1/25 15:29, Chenyi Qiang wrote:
>>
>>
>> On 1/9/2025 10:55 AM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 9/1/25 13:11, Chenyi Qiang wrote:
>>>>
>>>>
>>>> On 1/8/2025 7:20 PM, Alexey Kardashevskiy wrote:
>>>>>
>>>>>
>>>>> On 8/1/25 21:56, Chenyi Qiang wrote:
>>>>>>
>>>>>>
>>>>>> On 1/8/2025 12:48 PM, Alexey Kardashevskiy wrote:
>>>>>>> On 13/12/24 18:08, Chenyi Qiang wrote:
>>>>>>>> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require
>>>>>>>> uncoordinated discard") highlighted, some subsystems like VFIO
>>>>>>>> might
>>>>>>>> disable ram block discard. However, guest_memfd relies on the
>>>>>>>> discard
>>>>>>>> operation to perform page conversion between private and shared
>>>>>>>> memory.
>>>>>>>> This can lead to stale IOMMU mapping issue when assigning a
>>>>>>>> hardware
>>>>>>>> device to a confidential VM via shared memory (unprotected memory
>>>>>>>> pages). Blocking shared page discard can solve this problem, but it
>>>>>>>> could cause guests to consume twice the memory with VFIO, which is
>>>>>>>> not
>>>>>>>> acceptable in some cases. An alternative solution is to convey
>>>>>>>> other
>>>>>>>> systems like VFIO to refresh its outdated IOMMU mappings.
>>>>>>>>
>>>>>>>> RamDiscardManager is an existing concept (used by virtio-mem) to
>>>>>>>> adjust
>>>>>>>> VFIO mappings in relation to VM page assignment. Effectively page
>>>>>>>> conversion is similar to hot-removing a page in one mode and
>>>>>>>> adding it
>>>>>>>> back in the other, so the similar work that needs to happen in
>>>>>>>> response
>>>>>>>> to virtio-mem changes needs to happen for page conversion events.
>>>>>>>> Introduce the RamDiscardManager to guest_memfd to achieve it.
>>>>>>>>
>>>>>>>> However, guest_memfd is not an object so it cannot directly
>>>>>>>> implement
>>>>>>>> the RamDiscardManager interface.
>>>>>>>>
>>>>>>>> One solution is to implement the interface in HostMemoryBackend.
>>>>>>>> Any
>>>>>>>
>>>>>>> This sounds about right.
>
> btw I am using this for ages:
>
> https://github.com/aik/qemu/commit/3663f889883d4aebbeb0e4422f7be5e357e2ee46
>
> but I am not sure if this ever saw the light of the day, did not it?
> (ironically I am using it as a base for encrypted DMA :) )
Yeah, we are doing the same work. I saw a solution from Michael long
time ago (when there was still
a dedicated hostmem-memfd-private backend for restrictedmem/gmem)
(https://github.com/AMDESE/qemu/commit/3bf5255fc48d648724d66410485081ace41d8ee6)
For your patch, it only implement the interface for
HostMemoryBackendMemfd. Maybe it is more appropriate to implement it for
the parent object HostMemoryBackend, because besides the
MEMORY_BACKEND_MEMFD, other backend types like MEMORY_BACKEND_RAM and
MEMORY_BACKEND_FILE can also be guest_memfd-backed.
Think more about where to implement this interface. It is still
uncertain to me. As I mentioned in another mail, maybe ram device memory
region would be backed by guest_memfd if we support TEE IO iommufd MMIO
in future. Then a specific object is more appropriate. What's your opinion?
>
>>>>>>>
>>>>>>>> guest_memfd-backed host memory backend can register itself in the
>>>>>>>> target
>>>>>>>> MemoryRegion. However, this solution doesn't cover the scenario
>>>>>>>> where a
>>>>>>>> guest_memfd MemoryRegion doesn't belong to the HostMemoryBackend,
>>>>>>>> e.g.
>>>>>>>> the virtual BIOS MemoryRegion.
>>>>>>>
>>>>>>> What is this virtual BIOS MemoryRegion exactly? What does it look
>>>>>>> like
>>>>>>> in "info mtree -f"? Do we really want this memory to be DMAable?
>>>>>>
>>>>>> virtual BIOS shows in a separate region:
>>>>>>
>>>>>> Root memory region: system
>>>>>> 0000000000000000-000000007fffffff (prio 0, ram): pc.ram KVM
>>>>>> ...
>>>>>> 00000000ffc00000-00000000ffffffff (prio 0, ram): pc.bios KVM
>>>>>
>>>>> Looks like a normal MR which can be backed by guest_memfd.
>>>>
>>>> Yes, virtual BIOS memory region is initialized by
>>>> memory_region_init_ram_guest_memfd() which will be backed by a
>>>> guest_memfd.
>>>>
>>>> The tricky thing is, for Intel TDX (not sure about AMD SEV), the
>>>> virtual
>>>> BIOS image will be loaded and then copied to private region.
>>>> After that,
>>>> the loaded image will be discarded and this region become useless.
>>>
>>> I'd think it is loaded as "struct Rom" and then copied to the MR-
>>> ram_guest_memfd() which does not leave MR useless - we still see
>>> "pc.bios" in the list so it is not discarded. What piece of code are you
>>> referring to exactly?
>>
>> Sorry for confusion, maybe it is different between TDX and SEV-SNP for
>> the vBIOS handling.
>>
>> In x86_bios_rom_init(), it initializes a guest_memfd-backed MR and loads
>> the vBIOS image to the shared part of the guest_memfd MR.
>> For TDX, it
>> will copy the image to private region (not the vBIOS guest_memfd MR
>> private part) and discard the shared part. So, although the memory
>> region still exists, it seems useless.
>> It is different for SEV-SNP, correct? Does SEV-SNP manage the vBIOS in
>> vBIOS guest_memfd private memory?
>
> This is what it looks like on my SNP VM (which, I suspect, is the same
> as yours as hw/i386/pc.c does not distinguish Intel/AMD for this matter):
Yes, the memory region object is created on both TDX and SEV-SNP.
>
> Root memory region: system
> 0000000000000000-00000000000bffff (prio 0, ram): ram1 KVM gmemfd=20
> 00000000000c0000-00000000000dffff (prio 1, ram): pc.rom KVM gmemfd=27
> 00000000000e0000-000000001fffffff (prio 0, ram): ram1
> @00000000000e0000 KVM gmemfd=20
> ...
> 00000000ffc00000-00000000ffffffff (prio 0, ram): pc.bios KVM gmemfd=26
>
> So the pc.bios MR exists and in use (hence its appearance in "info mtree
> -f").
>
>
> I added the gmemfd dumping:
>
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -3446,6 +3446,9 @@ static void mtree_print_flatview(gpointer key,
> gpointer value,
> }
> }
> }
> + if (mr->ram_block && mr->ram_block->guest_memfd >= 0) {
> + qemu_printf(" gmemfd=%d", mr->ram_block->guest_memfd);
> + }
>
Then I think the virtual BIOS is another case not belonging to
HostMemoryBackend which convince us to implement the interface in a
specific object, no?
>
>>>
>>>
>>>> So I
>>>> feel like this virtual BIOS should not be backed by guest_memfd?
>>>
>>> From the above it sounds like the opposite, i.e. it should :)
>>>
>>>>>
>>>>>> 0000000100000000-000000017fffffff (prio 0, ram): pc.ram
>>>>>> @0000000080000000 KVM
>>>>>
>>>>> Anyway if there is no guest_memfd backing it and
>>>>> memory_region_has_ram_discard_manager() returns false, then the MR is
>>>>> just going to be mapped for VFIO as usual which seems... alright,
>>>>> right?
>>>>
>>>> Correct. As the vBIOS is backed by guest_memfd and we implement the RDM
>>>> for guest_memfd_manager, the vBIOS MR won't be mapped by VFIO.
>>>>
>>>> If we go with the HostMemoryBackend instead of guest_memfd_manager,
>>>> this
>>>> MR would be mapped by VFIO. Maybe need to avoid such vBIOS mapping, or
>>>> just ignore it since the MR is useless (but looks not so good).
>>>
>>> Sorry I am missing necessary details here, let's figure out the above.
>>>
>>>>
>>>>>
>>>>>
>>>>>> We also consider to implement the interface in HostMemoryBackend, but
>>>>>> maybe implement with guest_memfd region is more general. We don't
>>>>>> know
>>>>>> if any DMAable memory would belong to HostMemoryBackend although at
>>>>>> present it is.
>>>>>>
>>>>>> If it is more appropriate to implement it with HostMemoryBackend,
>>>>>> I can
>>>>>> change to this way.
>>>>>
>>>>> Seems cleaner imho.
>>>>
>>>> I can go this way.
>>
>> [...]
>>
>>>>>>>> +
>>>>>>>> +static int guest_memfd_rdm_replay_populated(const
>>>>>>>> RamDiscardManager
>>>>>>>> *rdm,
>>>>>>>> + MemoryRegionSection
>>>>>>>> *section,
>>>>>>>> + ReplayRamPopulate
>>>>>>>> replay_fn,
>>>>>>>> + void *opaque)
>>>>>>>> +{
>>>>>>>> + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm);
>>>>>>>> + struct GuestMemfdReplayData data = { .fn =
>>>>>>>> replay_fn, .opaque =
>>>>>>>> opaque };
>>>>>>>> +
>>>>>>>> + g_assert(section->mr == gmm->mr);
>>>>>>>> + return guest_memfd_for_each_populated_section(gmm, section,
>>>>>>>> &data,
>>>>>>>> +
>>>>>>>> guest_memfd_rdm_replay_populated_cb);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection
>>>>>>>> *section, void *arg)
>>>>>>>> +{
>>>>>>>> + struct GuestMemfdReplayData *data = arg;
>>>>>>>> + ReplayRamDiscard replay_fn = data->fn;
>>>>>>>> +
>>>>>>>> + replay_fn(section, data->opaque);
>>>>>>>
>>>>>>>
>>>>>>> guest_memfd_rdm_replay_populated_cb() checks for errors though.
>>>>>>
>>>>>> It follows current definiton of ReplayRamDiscard() and
>>>>>> ReplayRamPopulate() where replay_discard() doesn't return errors and
>>>>>> replay_populate() returns errors.
>>>>>
>>>>> A trace would be appropriate imho. Thanks,
>>>>
>>>> Sorry, can't catch you. What kind of info to be traced? The errors
>>>> returned by replay_populate()?
>>>
>>> Yeah. imho these are useful as we expect this part to work in general
>>> too, right? Thanks,
>>
>> Something like?
>>
>> diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c
>> index 6b3e1ee9d6..4440ac9e59 100644
>> --- a/system/guest-memfd-manager.c
>> +++ b/system/guest-memfd-manager.c
>> @@ -185,8 +185,14 @@ static int
>> guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, voi
>> {
>> struct GuestMemfdReplayData *data = arg;
>> ReplayRamPopulate replay_fn = data->fn;
>> + int ret;
>>
>> - return replay_fn(section, data->opaque);
>> + ret = replay_fn(section, data->opaque);
>> + if (ret) {
>> + trace_guest_memfd_rdm_replay_populated_cb(ret);
>> + }
>> +
>> + return ret;
>> }
>>
>> How about just adding some error output in
>> guest_memfd_for_each_populated_section()/
>> guest_memfd_for_each_discarded_section()
>> if the cb() (i.e. replay_populate()) returns error?
>
> this will do too, yes. Thanks,
>
- Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager, Alexey Kardashevskiy, 2025/01/07
- Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager, Chenyi Qiang, 2025/01/08
- Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager, Alexey Kardashevskiy, 2025/01/08
- Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager, Chenyi Qiang, 2025/01/08
- Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager, Alexey Kardashevskiy, 2025/01/08
- Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager, Chenyi Qiang, 2025/01/08
- Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager, Alexey Kardashevskiy, 2025/01/09
- Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager,
Chenyi Qiang <=
- Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager, Xu Yilun, 2025/01/10
- Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager, Xu Yilun, 2025/01/10