Am 29.02.24 um 12:48 schrieb Vladimir Sementsov-Ogievskiy:
On 29.02.24 13:11, Fiona Ebner wrote:
The iotest creates a new target image for each incremental sync which
only records the diff relative to the previous mirror and those diff
images are later rebased onto each other to get the full picture.
Thus, it can be that a previous mirror job (not just background process
or previous write) already copied a cluster, and in particular, copied
it to a different target!
Aha understand.
For simplicity, let's consider case, when source "cluster size" = "job
cluster size" = "bitmap granularity" = "target cluster size".
Which types of clusters we should consider, when we want to handle guest
write?
1. Clusters, that should be copied by background process
These are dirty clusters from user-given bitmap, or if we do a full-disk
mirror, all clusters, not yet copied by background process.
For such clusters we simply ignore the unaligned write. We can even
ignore the aligned write too: less disturbing the guest by delays.
Since do_sync_target_write() currently doesn't ignore aligned writes, I
wouldn't change it. Of course they can count towards the "done_bitmap"
you propose below.
2. Clusters, already copied by background process during this mirror job
and not dirtied by guest since this time.
For such clusters we are safe to do unaligned write, as target cluster
must be allocated.
Right.
3. Clusters, not marked initially by dirty bitmap.
What to do with them? We can't do unaligned write. I see two variants:
- do additional read from source, to fill the whole cluster, which seems
a bit too heavy
Yes, I'd rather only do that as a last resort.
- just mark the cluster as dirty for background job. So we behave like
in "background" mode. But why not? The maximum count of such "hacks" is
limited to number of "clear" clusters at start of mirror job, which
means that we don't seriously affect the convergence. Mirror is
guaranteed to converge anyway. And the whole sense of "write-blocking"
mode is to have a guaranteed convergence. What do you think?
It could lead to a lot of flips between job->actively_synced == true and
== false. AFAIU, currently, we only switch back from true to false when
an error happens. While I don't see a concrete issue with it, at least
it might be unexpected to users, so it better be documented.
I'll try going with this approach, thanks!