[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migratio
From: |
Fabiano Rosas |
Subject: |
Re: [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore |
Date: |
Mon, 06 Nov 2023 10:18:03 -0300 |
Peter Xu <peterx@redhat.com> writes:
> On Wed, Nov 01, 2023 at 02:28:24PM +0000, Daniel P. Berrangé wrote:
>> On Wed, Nov 01, 2023 at 10:21:07AM -0400, Peter Xu wrote:
>> > On Wed, Nov 01, 2023 at 09:26:46AM +0000, Daniel P. Berrangé wrote:
>> > > On Tue, Oct 31, 2023 at 03:03:50PM -0400, Peter Xu wrote:
>> > > > On Wed, Oct 25, 2023 at 11:07:33AM -0300, Fabiano Rosas wrote:
>> > > > > >> +static int parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock
>> > > > > >> *block, ram_addr_t length)
>> > > > > >> +{
>> > > > > >> + g_autofree unsigned long *bitmap = NULL;
>> > > > > >> + struct FixedRamHeader header;
>> > > > > >> + size_t bitmap_size;
>> > > > > >> + long num_pages;
>> > > > > >> + int ret = 0;
>> > > > > >> +
>> > > > > >> + ret = fixed_ram_read_header(f, &header);
>> > > > > >> + if (ret < 0) {
>> > > > > >> + error_report("Error reading fixed-ram header");
>> > > > > >> + return -EINVAL;
>> > > > > >> + }
>> > > > > >> +
>> > > > > >> + block->pages_offset = header.pages_offset;
>> > > > > >
>> > > > > > Do you think it is worth sanity checking that 'pages_offset' is
>> > > > > > aligned
>> > > > > > in some way.
>> > > > > >
>> > > > > > It is nice that we have flexibility to change the alignment in
>> > > > > > future
>> > > > > > if we find 1 MB is not optimal, so I wouldn't want to force 1MB
>> > > > > > align
>> > > > > > check htere. Perhaps we could at least sanity check for alignment
>> > > > > > at
>> > > > > > TARGET_PAGE_SIZE, to detect a gross data corruption problem ?
>> > > > > >
>> > > > >
>> > > > > I don't see why not. I'll add it.
>> > > >
>> > > > Is there any explanation on why that 1MB offset, and how the number is
>> > > > chosen? Thanks,
>> > >
>> > > The fixed-ram format is anticipating the use of O_DIRECT.
>> > >
>> > > With O_DIRECT both the buffers in memory, and the file handle offset
>> > > have alignment requirements. The buffer alignments are usually page
>> > > sized, and QEMU RAM blocks will trivially satisfy those.
>> > >
>> > > The file handle offset alignment varies per filesystem. While you can
>> > > query the alignment for the FS holding the file with statx(), that is
>> > > not appropriate todo. If a user saves/restores QEMU state to file, we
>> > > must assume there is a chance the user will copy the saved state to a
>> > > different filesystem.
>> > >
>> > > IOW, we want alignment to satisfy the likely worst case.
>> > >
>> > > Picking 1 MB is a nice round number that is large enough that it is
>> > > almost certainly going to satisfy any filesystem alignment. In fact
>> > > it is likely massive overkill. None the less 1 MB is also still tiny
>> >
>> > Is that calculated by something like max of possible host (small) page
>> > sizes? I've no idea what's it for all archs, the max small page size I'm
>> > aware of is 64K, but I don't know a lot archs.
>>
>> It wasn't anything as precise as that. It is literally just "1MB" looks
>> large enough that we don't need to spend time to investigate per arch
>> page sizes.
>
> IMHO we need that precision on reasoning and document it, even if not on
> the exact number we prefer, which can be prone to change later. Otherwise
> that value will be a pure magic soon after a few years or even less, it'll
> be more of a challenge later to figure things out.
>
>>
>> Having said that I'm now having slight self-doubt wrt huge pages, though
>> I swear we investigated it last year when first discussing this feature.
>> The guest memory will of course already be suitably aligned, but I'm
>> wondering if the filesystem I/O places any offset alignment constraints
>> related to non-default page size.
>
> AFAIU direct IO is about pinning the IO buffers, playing the role of fs
> cache instead. If my understanding is correct, huge pages shouldn't be a
> problem for such pinning, because it's legal to pin partial of a huge page.
>
> After the partial huge pages pinned, they should be treated as normal fs
> buffers when doing block IO. And then the offset of file should, per my
> understanding, not relevant to what is the type of backend of that user
> buffer anymore that triggers read()/write().
>
> But maybe I missed something, if so that will need to be part of
> documentation of 1MB magic value, IMHO. We may want to double check with
> that by doing fixed-ram migration on e.g. 1GB hugetlb memory-backend-file
> with 1MB file offset per-ramblock.
Does anyone have any indication that we need to relate the aligment to
the page size? All I find online points to device block size being the
limiting factor for filesystems. There's also raw_probe_alignment() at
file-posix.c which seems to check up to 4k and recommend to disable
O_DIRECT if an alignment is not found.
Note that we shouldn't have any problems changing the alignment we
choose since we have a pointer to the start of the aligned region which
goes along with the fixed-ram header. We could even do some probing like
the block layer does if we wanted.