qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 6/7] block/backup: teach backup_cow_with_boun


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v2 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
Date: Sat, 10 Aug 2019 12:12:24 +0000

09.08.2019 18:59, Eric Blake wrote:
> On 8/9/19 10:32 AM, Vladimir Sementsov-Ogievskiy wrote:
>> backup_cow_with_offload can transfer more than on cluster. Let
> 
> s/on/one/
> 
>> backup_cow_with_bounce_buffer behave similarly. It reduces number
>> of IO and there are no needs to copy cluster by cluster.
> 
> It reduces the number of IO requests, since there is no need to copy
> cluster by cluster.
> 
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>> ---
>>   block/backup.c | 29 +++++++++++++++--------------
>>   1 file changed, 15 insertions(+), 14 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index d482d93458..155e21d0a3 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -104,22 +104,25 @@ static int coroutine_fn 
>> backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>                                                         int64_t start,
>>                                                         int64_t end,
>>                                                         bool 
>> is_write_notifier,
>> -                                                      bool *error_is_read,
>> -                                                      void **bounce_buffer)
>> +                                                      bool *error_is_read)
> 
> Why is this signature changing?
> 
>>   {
>>       int ret;
>>       BlockBackend *blk = job->common.blk;
>>       int nbytes;
>>       int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>> +    void *bounce_buffer;
>>   
>>       assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>> -    nbytes = MIN(job->cluster_size, job->len - start);
>> -    if (!*bounce_buffer) {
>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
> 
> Pre-patch, you allocate the bounce_buffer at most once (but limited to a
> cluster size), post-patch, you are now allocating and freeing a bounce
> buffer every iteration through.  There may be fewer calls because you
> are allocating something bigger, but still, isn't it a good goal to try
> and allocate the bounce buffer as few times as possible and reuse it,
> rather than getting a fresh one each iteration?

Yes, it's a "degradation" of this patch, I was afraid of this question.
However, I doubt that it should be optimized:

1. I'm going to run several copy requests in coroutines to parallelize copying 
loop,
to improve performance (series for qcow2 are here
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg06654.html), so we'll 
need
several buffers for parallel copying requests and it's extremely easier to 
allocate
buffer when it's needed and free after it, than do some allocated memory 
sharing like
in mirror.

2. Actually it is a question about memory allocator: is it fast and optimized
enough to not care, or is it bad, and we should work-around it like in
mirror? And in my opinion (proved by a bit of benchmarking) memalign
is fast enough to don't care. I was answering similar question in more details
and with some numbers here:
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg00087.html

So, I'd prefere to not care and keep code simpler. But if you don't agree,
I can leave shared buffer here, at least until introduction of parallel 
requests.
Then, it will be something like qemu_try_blockalign(MIN(bytes, 64M))..

> 
>> +
>> +    nbytes = MIN(end - start, job->len - start);
> 
> What is the largest this will be? Will it exhaust memory on a 32-bit or
> otherwise memory-constrained system, where you should have some maximum
> cap for how large the bounce buffer will be while still getting better
> performance than one cluster at a time and not slowing things down by
> over-sizing malloc?  64M, maybe?
> 

Hmm, good point. I thought that it's safe to allocate buffer for a full request,
as if such buffer is already allocated for the guest request itself, it should
not be bad to allocate another one with same size. But I forgot about 
write-zeros
and discard operations which may be large without any allocation and here we 
need
to allocate anyway.

Hmm2, but we have parallel guest writes(discards) and therefore parallel
copy-before-write operations. So total allocation is not limited anyway and it
should be fixed somehow too..

-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]