qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-mem


From: Peter Xu
Subject: Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager
Date: Thu, 30 Jan 2025 11:28:11 -0500

On Sun, Jan 26, 2025 at 11:34:29AM +0800, Xu Yilun wrote:
> > Definitely not suggesting to install an invalid pointer anywhere.  The
> > mapped pointer will still be valid for gmem for example, but the fault
> > isn't.  We need to differenciate two things (1) virtual address mapping,
> > then (2) permission and accesses on the folios / pages of the mapping.
> > Here I think it's okay if the host pointer is correctly mapped.
> > 
> > For your private MMIO use case, my question is if there's no host pointer
> > to be mapped anyway, then what's the benefit to make the MR to be ram=on?
> > Can we simply make it a normal IO memory region?  The only benefit of a
> 
> The guest access to normal IO memory region would be emulated by QEMU,
> while private assigned MMIO requires guest direct access via Secure EPT.
> 
> Seems the existing code doesn't support guest direct access if
> mr->ram == false:

Ah it's about this, ok.

I am not sure what's the best approach, but IMHO it's still better we stick
with host pointer always available when ram=on.  OTOH, VFIO private regions
may be able to provide a special mark somewhere, just like when romd_mode
was done previously as below (qemu 235e8982ad39), so that KVM should still
apply these MRs even if they're not RAM.

> 
> static void kvm_set_phys_mem(KVMMemoryListener *kml,
>                              MemoryRegionSection *section, bool add)
> {
>     [...]
> 
>     if (!memory_region_is_ram(mr)) {
>         if (writable || !kvm_readonly_mem_allowed) {
>             return;
>         } else if (!mr->romd_mode) {
>             /* If the memory device is not in romd_mode, then we actually want
>              * to remove the kvm memory slot so all accesses will trap. */
>             add = false;
>         }
>     }
> 
>     [...]
> 
>     /* register the new slot */
>     do {
> 
>         [...]
> 
>         err = kvm_set_user_memory_region(kml, mem, true);
>     }
> }
> 
> > ram=on MR is, IMHO, being able to be accessed as RAM-like.  If there's no
> > host pointer at all, I don't yet understand how that helps private MMIO
> > from working.
> 
> I expect private MMIO not accessible from host, but accessible from
> guest so has kvm_userspace_memory_region2 set. That means the resolving
> of its PFN during EPT fault cannot depends on host pointer.
> 
> https://lore.kernel.org/all/20250107142719.179636-1-yilun.xu@linux.intel.com/

I'll leave this to KVM experts, but I actually didn't follow exactly on why
mmu notifier is an issue to make , as I thought that was per-mm anyway, and KVM
should logically be able to skip all VFIO private MMIO regions if affected.
This is a comment to this part of your commit message:

        Rely on userspace mapping also means private MMIO mapping should
        follow userspace mapping change via mmu_notifier. This conflicts
        with the current design that mmu_notifier never impacts private
        mapping. It also makes no sense to support mmu_notifier just for
        private MMIO, private MMIO mapping should be fixed when CoCo-VM
        accepts the private MMIO, any following mapping change without
        guest permission should be invalid.

So I don't yet see a hard-no of reusing userspace mapping even if they're
not faultable as of now - what if they can be faultable in the future?  I
am not sure..

OTOH, I also don't think we need KVM_SET_USER_MEMORY_REGION3 anyway.. The
_REGION2 API is already smart enough to leave some reserved fields:

/* for KVM_SET_USER_MEMORY_REGION2 */
struct kvm_userspace_memory_region2 {
        __u32 slot;
        __u32 flags;
        __u64 guest_phys_addr;
        __u64 memory_size;
        __u64 userspace_addr;
        __u64 guest_memfd_offset;
        __u32 guest_memfd;
        __u32 pad1;
        __u64 pad2[14];
};

I think we _could_ reuse some pad*?  Reusing guest_memfd field sounds error
prone to me.

Not sure it could be easier if it's not guest_memfd* but fd + fd_offset
since the start.  But I guess when introducing _REGION2 we didn't expect
MMIO private regions come so soon..

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]