[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-discuss] block device write caching, notifications, and QCOW2 issu
From: |
Christian Böhme |
Subject: |
[Qemu-discuss] block device write caching, notifications, and QCOW2 issues |
Date: |
Mon, 24 Oct 2016 14:06:13 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.8.0 |
Hello all,
We are using Qemu as the VMM in a KVM/Linux setup with
file=/var/⟨some regular file
name⟩,if=none,id=drive-virtio,disk0,format=qcow2,cache=none
as arguments to the only -drive option in th invocation,
i.e., the VM is constructed with a single block device for
persistent storage. The guest OS in question is a run-off-the-mill
Ubuntu GNU/Linux.
On the Ubuntu GNU/Linux host, we have
$ cat /proc/mounts | grep /var
/dev/sda5 /var ext4 rw,nodev,relatime,data=ordered 0 0
$ cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
# hdparm -I /dev/sda | grep -i cache
cache/buffer size = unknown
* Write cache
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
$ uname -r
3.13.0-86-generic
$ dpkg-query -W -f '${Package}: ${Version}\n' qemu-system-x86
qemu-system-x86: 1.5.0+dfsg-3ubuntu5.4~cloud0
while in the guest, we have
$ cat /proc/mounts | grep 'data=ordered'
/dev/vda1 / ext4 rw,relatime,data=ordered 0 0
$ cat /sys/block/vda/queue/scheduler
none
$ lspci | grep -i -e sata -e scsi
00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
$ cat /sys/block/vda/device/features
0010101101110000000000000000110000000000000000000000000000000000
$ cat /sys/block/vda/cache_type
write back
$ uname -r
3.13.0-95-generic
So far, so good.
With the above setup, we have seen inconsistencies in the guest's
filesystem more than once, when the VM was restarted after the host
abruptly lost its power. This, of course, is to be expected with write
caches enabled and when the guest fails to f(data)sync(2) freshly written
data, before the host goes down.
However, we have also seen regular files, whose data, according to the
filesystem, was modified weeks before the outage, but that nevertheless had
garbled contents after the restart. Such a constellation is rather unexpected,
since it is unlikely that it takes /that/ long for a well exercised journaling
fileystem to commit its changes to persistent storage. The "new" contents, it
seems, is not completely random, but looks more like the result of a block
address permutation behind the filesystem's back, as it contains fragments
that one may find in other regular files of the same filesystem. That is,
the filesystem keeps thinking it addresses the same blocks it did all
along for weeks, while the addresses themselves point to different
blocks now.
Has anyone else experienced such a behaviour? Could the block driver
stacking employed in Qemu be the culprit, or just the Qemu QCOW2 layer?
It looks like there is just a tad bit too much going on when it tries to
map block addresses to regular file offsets, and this widens the window
within which "nothing may happen" on the host.
While reading qemu(1), I also came across the notion of "write notification"
in relation to block device write caching, where setting either
cache=writethrough
or cache=directsync will Qemu have them generated. Lacking further
documentation on them, I dug through the code (
$ git status
HEAD detached at v1.5.0
nothing to commit, working directory clean
), but the only thing I could discern from this was that cache=writethrough
or cache=directsync forces the Qemu block layer to issue an explicit flush
on the block driver(s) in question immediately after every write request via
bdrv_co_flush(). Since every request that comes in from the guest's virtio_blk
device is already acknowledged via virtio_notify(), itself via
virtio_blk_req_complete(),
the qestion remains, what "write notifications" actually are. Anyone?
Cheers,
Christian
--
Developer Systemintegration
CLOUD&HEAT
The Cloud that heats homes worldwide
Firmen- und Rechnungsanschrift:
CLOUD & HEAT Technologies GmbH
Zeitenströmung
Königsbrücker Str. 96 - Halle 15
01099 Dresden, Germany
Lieferanschrift Produktion:
CLOUD & HEAT Technologies GmbH
Zeitenströmung
Königsbrücker Str. 96 - Halle 16A
01099 Dresden, Germany
Tel: +49 351 479 3670-202
Fax: +49 351 479 3670-110
E-Mail: address@hidden <mailto:address@hidden>
Web: https://www.cloudandheat.com
Besuchen Sie uns:
Facebook <https://www.facebook.com/CloudandHeat>
Google+ <https://plus.google.com/+Cloudandheat>
LinkedIn <https://www.linkedin.com/company/cloud-&-heat-technologies-gmbh>
Twitter <https://twitter.com/CLOUDandHEAT>
Xing <https://www.xing.com/companies/cloud%2526heattechnologiesgmbh>
Youtube <https://www.youtube.com/cloudandheat>
Handelsregister: Amtsgericht Dresden
Registernummer: HRB 30549
USt.-Ident.-Nr.: DE281093504
Geschäftsführer: Nicolas Röhrs
Gemeinsam mit uns nachhaltig sein!
Nicht jede E-Mail muss gedruckt werden.
Hinweis: Diese E-Mail und / oder die Anhänge ist / sind vertraulich und
ausschließlich für den bezeichneten Adressaten bestimmt. Die Weitergabe oder
Kopieren dieser E-Mail ist strengstens verboten. Wenn Sie diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte unverzüglich den Absender und
vernichten Sie die Nachricht und alle Anhänge. Vielen Dank.
signature.asc
Description: OpenPGP digital signature
- [Qemu-discuss] block device write caching, notifications, and QCOW2 issues,
Christian Böhme <=