[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-discuss] Disk Corruption
From: |
Jakob Bohm |
Subject: |
Re: [Qemu-discuss] Disk Corruption |
Date: |
Wed, 1 Jun 2016 20:29:30 +0200 |
User-agent: |
Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 |
On 01/06/2016 20:13, Jacob Godin wrote:
Thanks for the tips Jakob. Please see below for details.
Please clarify a few things for the other people on this list (I
don't have a solution for your issue, but would like it to be solved
just to improve the reliability of my own qcow2 disks):
On 01/06/2016 17:47, Jacob Godin wrote:
Hi all,
Been running into an issue with qcow2 disk corruption, hoping we
can get pointed in the right direction. We're currently using
latest qemu from Trusty.
Is this Ubuntu?
What is the numeric Ubuntu version?
What is the actual qemu package versions you use ("latest" isn't
exactly precise)?
Yes, Ubuntu 14.04. qemu-img version 2.0.0 (2.0.0+dfsg-2ubuntu1.22)
qemu-img is just the command line tools such as qemu-img, qemu itself
is in a different package.
The issue started after powering a VM off and on again. One
first boot, the guest (CentOS 6) started reporting I/O issues
almost immediately and then crashed. Following that, the VM was
unable to read the disk (kept looping through BIOS boot process).
How did you "power off" the VM?
Using virsh shutdown
Did you use some qemu management tool (which one and which version)?
libvirt version 1.2.2 (1.2.2-0ubuntu13.1.17)
Did you kill the qemu process?
We made sure it was dead before taking the snap.
Not clear: Did you *kill* the qemu process or did it exit all by itself
when you shut down the guest?
And the same question back when you made the snapshot.
Did you do a "clean" shutdown of the Guest OS and wait for the Guest
OS to tell the qemu process to exit on its own?
Yes, virsh shutdown issues a safe shutdown via ACPI
Other people on this list may know more about what that libvirt version
does in this situation (beside the initial "polite" request via a qemu
command to generate the ACPI event).
(Note: The latter should not be a requirement for the qcow2
meta-data to survive, only for the disk image inside to be an image
of a clean or unclean disk, however it may matter as to how the bug
was triggered).
The disk has a single snapshot, which we were able to get
working by following this process:
* Attempt to apply snap. Supposedly fails.
When you "attempted to apply the snapshot", which tool (and version)
did you use?
Same as above, qemu-img 2.0.0
Ok, so not libvirt's snapshot management commands then.
* Run qemu-img check + repair
* Use qemu-img convert to convert qcow2 to qcow2
Once complete, we were able to boot from the disk, however it
was at the point that the snapshot was taken. We have attempted
to do a check+repair and then convert without applying the
snapshot, but are running into the following errors:
* qemu-img check + repair:
Warning: cluster offset=0x2d3120706a0000 is after the end of
the image file, can't properly check refcounts.
ERROR offset=2d312070696e00: Cluster is not properly
aligned;
L2 entry corrupted.
Warning: cluster offset=0x2d310a43500000 is after the end of
the image file, can't properly check refcounts.
Warning: cluster offset=0x2d310a43510000 is after the end of
the image file, can't properly check refcounts.
ERROR offset=2d310a43505500: Cluster is not properly
aligned;
L2 entry corrupted.
Warning: cluster offset=0x20496e74650000 is after the end of
the image file, can't properly check refcounts.
Warning: cluster offset=0x20496e74660000 is after the end of
the image file, can't properly check refcounts.
ERROR offset=20496e74656c00: Cluster is not properly
aligned;
L2 entry corrupted.
Warning: cluster offset=0x2f6d6d6f6e0000 is after the end of
the image file, can't properly check refcounts.
Warning: cluster offset=0xd2070726f0000 is after the end of
the image file, can't properly check refcounts.
Warning: cluster offset=0xd207072700000 is after the end of
the image file, can't properly check refcounts.
Warning: cluster offset=0x336f7220730000 is after the end of
the image file, can't properly check refcounts.
* qemu-img convert:
qemu-img: error while reading block status of sector 147456:
Input/output error
Here's qemu-img from that disk:
image: disk.pre-convert
file format: qcow2
virtual size: 180G (193273528320 bytes)
disk size: 153G
cluster_size: 65536
backing file: /var/lib/nova/instances/_base/xxx
Snapshot list:
ID TAG VM SIZE DATE
VM CLOCK
67 xxx 0 2016-04-14 05:22:34 00:00:00.000
Note that the virtual size has been increased from 80G. It
previously looked like this:
image: disk.pre-convert
file format: qcow2
virtual size: 80G (85899345920 bytes)
disk size: 153G
cluster_size: 65536
backing file:
/var/lib/nova/instances/_base/c45e2e81d34824861271a098bccd5585128e2c05
Snapshot list:
ID TAG VM SIZE DATE
VM CLOCK
67 e50825fbd43e455283ef847b12eaea4c 0 2016-04-14
05:22:34 00:00:00.000
We've tried using qcow2.py from src to clear the snapshot
headers, however it didn't help.
Enjoy
Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded