qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Which qemu change corresponds to RedHat bug 1655408


From: Jakob Bohm
Subject: Re: Which qemu change corresponds to RedHat bug 1655408
Date: Tue, 13 Oct 2020 03:01:55 +0200
User-agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1

On 2020-10-12 13:47, Max Reitz wrote:
On 09.10.20 14:55, Jakob Bohm wrote:
On 2020-10-09 10:48, Max Reitz wrote:
On 08.10.20 18:49, Jakob Bohm wrote:
(Top posting because previous reply did so):

If the bug was closed as "can't reproduce", why was a very similar bug
listed as fixed in RHSA-2019:2553-01 ?

Hi,

Which very similar bug do you mean?  I can only guess that perhaps you
mean 1603104 or 1551486.

Bug 1603104 was about qemu not ignoring errors when releasing file locks
fails (we should ignore errors then, because they're not fatal, and we
often cannot return errors, so they ended up as aborts).  (To give more
context, this error generally appeared only when the storage the image
is on somehow disappeared while qemu is running.  E.g. when the
connection to an NFS server was lost.)

Bug 1551486 entailed a bit of a rewrite of the whole locking code, which
may have resulted in the bug 1655408 no longer appearing for our QE
team.  But it was a different bug, as it wasn’t about any error, but
just about the fact that qemu used more FDs than necessary.

(Although I see 1655408 was reported for RHEL 8, whereas 1603104 and
1551486 (as part of RHSA-2019:2553) were reported for RHEL 7.  The
corresponding RHEL 8 bug for those two is 1694148.)

Either way, both of those bugs are fixed in 5.0.


1655408 in contrast reports an error at startup; locking itself failed.
   I couldn’t reproduce it, and I still can’t; neither with the image
mounted concurrently, nor with an RO NFS mount.

(For example:

exports:
[...]/test-nfs-ro
127.0.0.1(ro,sync,no_subtree_check,fsid=0,insecure,crossmnt)

$ for i in $(seq 100); do \
      echo -e '\033[1m---\033[0m'; \
      x86_64-softmmu/qemu-system-x86_64 \
        -drive \
          if=none,id=drv0,readonly=on,file=/mnt/tmp/arch.iso,format=raw \
        -device ide-cd,drive=drv0 \
        -enable-kvm -m 2048 -display none &; \
      pid=$!; \
      sleep 1; \
      kill $pid; \
    done

(Where x86_64-softmmu/qemu-system-x86_64 is upstream 5.0.1.)

All I see is something like:

---
qemu-system-x86_64: terminating on signal 15 from pid 7278 (/bin/zsh)
[2] 34103
[3]  - 34095 terminated  x86_64-softmmu/qemu-system-x86_64 -drive
-device ide-cd,drive=drv0  -m 2048

So no file locking errors.)


The error I got was specifically "Failed to lock byte 100" and VM not
starting.  The ISO file was on a R/W NFS3 share, but was itself R/O for
the user that root was mapped to by linux-nfs-server via /etc/exports
options, specifically the file iso file was mode 0444 in a 0755
directory, and the exports line was (simplified)

/share1
xxxx:xxxx:xxxx:xxxx/64(ro,sync,mp,subtree_check,anonuid=1000,anongid=1000)

where xxxx:xxxx:xxxx:xxxx/64 is the numeric IPv6 prefix of the LAN

NFS kernel Server ran Debian Stretch kernel 4.19.0-0.bpo.8-amd64 #1 SMP
Debian 4.19.98-1~bpo9+1 (2020-03-09) x86_64 GNU/Linux

NFS client mount options were:

rw,nosuid,nodev,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,
soft,proto=tcp6,timeo=600,retrans=6,sec=sys,mountaddr=xxxx:xxxx:xxxx:xxxx:xxxx:xxff:fexx:xxxx,

mountvers=3,mountport=45327,mountproto=udp6,local_lock=none,addr=xxxx:xxxx:xxxx:xxxx:xxxx:xxff:fexx:xxxx

I’ve tried using these settings, but still can’t reproduce the bug.

Nothing else uses the image when you try to attach it to qemu, right?
(Your last email noted something about a loop mount, but I’m not sure
whether that just referred to the RH Bugzilla entry.)

(local_lock=none means that all locks are relayed to the server, correct?)

Max


Nothing else was supposed to access that ISO at the time, but at various times that ISO has been accessed by different virtualization systems for different virtual machines, and maybe something didn't release its own locks from much earlier (virtualization hosts tend to accumulate a lot of uptime).

Coordinating locking of shared disk images between multiple qemu
instances should ideally try to emulate what happens when a SCSI disk is
shared over a SAN (fibre channel, iSCSI, shared parallel SCSI bus etc.), so if a VM issues the SCSI lock management commands, they should behave as they would for real hardware.

My reference to loop mounts refers to the (common) scenario where
someone tries to mount a raw image file using both qemu and OS methods,
with the loop block driver being the traditional POSIX method that would
be invoked by not using the qemu NBD server.

My large batch job is still running...








Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded



reply via email to

[Prev in Thread] Current Thread [Next in Thread]