qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file a


From: Peter Xu
Subject: Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Date: Wed, 9 Aug 2023 11:15:23 -0400

On Wed, Aug 09, 2023 at 11:20:11AM +0200, David Hildenbrand wrote:
> Hi Peter!

Hi, David,

> 
> > > -    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, 
> > > &created,
> > > -                       errp);
> > > +    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, 
> > > &created);
> > > +    if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
> > > +        /*
> > > +         * We can have a writable MAP_PRIVATE mapping of a readonly file.
> > > +         * However, some operations like ftruncate() or fallocate() 
> > > might fail
> > > +         * later, let's warn the user.
> > > +         */
> > > +        fd = file_ram_open(mem_path, memory_region_name(mr), true, 
> > > &created);
> > > +        if (fd >= 0) {
> > > +            warn_report("backing store %s for guest RAM (MAP_PRIVATE) 
> > > opened"
> > > +                        " readonly because the file is not writable", 
> > > mem_path);
> > 
> > I can understand the use case, but this will be slightly unwanted,
> > especially the user doesn't yet have a way to predict when will it happen.
> 
> Users can set the file permissions accordingly I guess. If they don't want
> the file to never ever be modified via QEMU, set it R/O.
> 
> > 
> > Meanwhile this changes the behavior, is it a concern that someone may want
> > to rely on current behavior of failing?
> 
> The scenario would be that someone passes a readonly file to "-mem-path" or
> "-object memory-backend-file,share=off,readonly=off", with the expectation
> that it would currently fail.
> 
> If it now doesn't fail (and we warn instead), what would happen is:
> * In file_ram_alloc() we won't even try ftruncate(), because the file
>   already had a size > 0. So ftruncate() is not a concern as I now
>   realize.
> * fallocate might fail later. AFAIKS, that only applies to
>   ram_block_discard_range().
>  -> virtio-mem performs an initial ram_block_discard_range() check and
>     fails gracefully early.
>  -> virtio-ballooon ignores any errors
>  -> ram_discard_range() in migration code fails early for postcopy in
>     init_range() and loadvm_postcopy_ram_handle_discard(), handling it
>     gracefully.
> 
> So mostly nothing "bad" would happen, it might just be undesirable, and we
> properly warn.

Indeed, all of them can fail gracefully, while balloon one is harmless.
Definitely good news.

> 
> Most importantly, we won't be corrupting/touching the original file in any
> case, because it is R/O.
> 
> If we really want to be careful, we could clue that behavior to compat
> machines. I'm not really sure yet if we really have to go down that path.
> 
> Any other alternatives? I'd like to avoid new flags where not really
> required.

I was just thinking of a new flag. :) So have you already discussed that
possibility and decided that not a good idea?

The root issue to me here is we actually have two resources (memory map of
the process, and the file) but we only have one way to describe the
permissions upon the two objects.  I'd think it makes a lot more sense if a
new flag is added, when there's a need to differentiate the two.

Consider if you see a bunch of qemu instances with:

  -mem-path $RAM_FILE

On the same host, which can be as weird as it could be to me.. At least
'-mem-path' looks still like a way to exclusively own a ram file for an
instance. I hesitate the new fallback can confuse people too, while that's
so far not the major use case.

Nobody may really rely on any existing behavior of the failure, but
changing existing behavior is just always not wanted.  The guideline here
to me is: whether we want existing "-mem-path XXX" users to start using the
fallback in general?  If it's "no", then maybe it implies a new flag is
better?

> 
> > 
> > To think from a higher level of current use case, the ideal solution seems
> > to me that if the ram file can be put on a file system that supports CoW
> > itself (like btrfs), we can snapshot that ram file and make it RW for the
> > qemu instance. Then here it'll be able to open the file.  We'll be able to
> > keep the interface working as before, meanwhile it'll work with fallocate
> > or truncations too I assume.
> > 
> > Would that be better instead of changing QEMU?
> 
> As I recently learned, using file-backed VMs (on real ssd/disks, not
> shmem/hugetlb) is usually undesired, because the dirtied pages will
> constantly get rewritten to disk by background writeback threads, eventually
> resulting in bad performance and SSD wear.
> 
> So while using a COW filesystem sounds cleaner in theory, it's not
> applicable in practice -- unless one disables any background writeback,
> which has different side effects because it cannot be configured on a
> per-file basis.
> 
> So for VM templating, it makes sense to capture the guest RAM and store it
> in a file, to then use a COW (MAP_PRIVATE) mapping. Using a read-only file
> makes perfect sense in that scenario IMHO.

Valid point.

> 
> [I'm curious at what point a filesystem will actually break COW. if it's
> wired up to the writenotify infrastructure, it would happen when actually
> writing to a page, not at mmap time. I know that filesystems use writenotify
> for lazy allocation of disk blocks on file holes, maybe they also do that
> for lazy allocation of disk blocks on COW]

I don't know either, but it definitely looks more promising and reasonable
if the CoW only happens until being written, rather than being mapped RW.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]