On Tue, Apr 16, 2024 at 7:54 AM Stefan Fritsch <
sf@sfritsch.de> wrote:
adding John Snow to CC because he investigated this in 2020.
On Fri, 12 Apr 2024, Eric Blake wrote:
> On Fri, Apr 12, 2024 at 10:06:17AM +0200, Stefan Fritsch wrote:
> > Commit 99868af3d0 changed the hardcoded constant BDRV_SECTOR_SIZE to a
> > dynamic field 'align' but introduced a bug. qemu_iovec_discard_back()
> > is now passed the wanted iov length instead of the actually required
> > amount that should be removed from the end of the iov.
> >
> > The bug can likely only be hit in uncommon configurations, e.g. with
> > icount enabled or when reading from disk directly to device memory.
> >
> > Fixes: 99868af3d0a75cf6 ("dma-helpers: explicitly pass alignment into DMA helpers")
> > Signed-off-by: Stefan Fritsch <sf@sfritsch.de>
> > ---
> > system/dma-helpers.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
>
> Wow, that bug has been latent for a while (2016, v2.8.0). As such, I
> don't think it's worth holding up 9.0 for, but it is definitely worth
> cc'ing qemu-stable (done now).
we had some more internal discussions and did some more googling, and it
turns out that this is actually part of a known issue:
https://lists.nongnu.org/archive/html/qemu-block/2020-07/msg01866.html
https://gitlab.com/qemu-project/qemu/-/issues/259
In the above mail to qemu-block, John Snow listed 4 problems in the code
but my patch only fixes the first one. Considering that the code may also
write data to the wrong place (problem 2), I wonder if an assert in the
same place would be better until the underlying issues have been fixed?
Whew, it's been such a long time since I've looked at this, I barely remember. I think my email now knows more about the problem than *I* currently do. I'm glad I wrote it ...
I do not currently know what happens if you fix the first issue but not the other four; I don't know if it gets passably "less broken" or if it causes other bigger issues... and I don't have the bandwidth to test that out currently, my apologies.
If there's a renewed interest in fixing this, I can try to start ramping back up on it, but I have a few other projects to finish first, but possibly I could discuss this at KVM Forum to help delineate the work into manageable chunks.
~js