Have you guys considered flock() to prevent this class of issues? If qemu-img or qemu-system refused to open a locked qcow2 image it could eliminated the possibility of the accidental corruption and you wouldn't ask me whether I used 'qemu-img snapshot' command when qemu was running? ;-)
That's awesome! I would consider the changes to cpu scheduling and PCI MSI save in such case. If I get the corruption again, I would be sure that it is not because of the snapshot (as I will avoid using it via external tool while qemu is running).
On Thu, 11/05 04:59, Ivan Volosyuk wrote:
> I cannot confirm that. There is a possibility that VM was still running
> when I created new snapshot.
That is definitely very dangerous and may be the reason why it's corrupted...
Fam
>
> On Thu, Nov 5, 2015 at 2:46 PM Fam Zheng <address@hidden> wrote:
>
> > On Thu, 11/05 03:15, Ivan Volosyuk wrote:
> > > That doesn't sound too reassuring. Sounds it's time say good buy for my
> > > data and use pretty dated backup ;-) I thought qcow2 snapshots will help
> > > me. hexdump always annoys me that it swaps pairs of bytes to form u16s.
> > > Here it is:
> >
> > Regarding snapshot, could you confirm that you *didn't* try to create a
> > snapshot with "qemu-img snapshot" while the VM is running?
> >
> > https://bugs.launchpad.net/qemu/+bug/1354167/comments/12
> >
> > >
> > > 0000000 4651 fb49 0000 0300 0000 0000 0000 0000
> > > 0000010 0000 0000 0000 1000 0000 0001 0000 0000
> > > 0000020 0000 0000 0000 0008 0000 0000 0300 0000
> > > 0000030 0000 0000 0100 0000 0000 0100 0000 0600
> > > 0000040 0000 2300 6f45 0000 0000 0000 0000 0200
> > > 0000050 0000 0000 0000 0000 0000 0000 0000 0000
> > > 0000060 0000 0400 0000 6800 0368 57f8 0000 9000
> > > 0000070 0000 6964 7472 2079 6962 0074 0000 0000
> > > 0000080 0000 0000 0000 0000 0000 0000 0000 0000
> > > *
> > > 00000a0 0100 6f63 7272 7075 2074 6962 0074 0000
> > > 00000b0 0000 0000 0000 0000 0000 0000 0000 0000
> > > *
> > > 00000d0 0001 616c 797a 7220 6665 6f63 6e75 7374
> > > 00000e0 0000 0000 0000 0000 0000 0000 0000 0000
> > > *
> > > 0010000 0000 0000 0200 0000 0000 0000 0380 0000
> > > 0010010 0000 0100 1c00 0000 0000 0100 1e80 0000
> > > 0010020 0000 0200 0a00 0000 0000 0200 0d80 0000
> > > 0010030 0000 0300 1a00 0000 0000 0300 0380 0000
> > >
> > >
> > > On Thu, Nov 5, 2015 at 2:08 PM Fam Zheng <address@hidden> wrote:
> > >
> > > > [Cc'ing qcow2 developers]
> > > >
> > > > On Thu, 11/05 02:05, Ivan Volosyuk wrote:
> > > > > The image has some personal data and is pretty large 1T (140G
> > > > allocated). I
> > > > > recompiled qemu-img and run through gdb:
> > > > > Program received signal SIGSEGV, Segmentation fault.
> > > > > get_refcount_ro4 (refcount_array=0x7fffc0edc010,
> > index=246458459629569)
> > > > at
> > > > > block/qcow2-refcount.c:179
> > > > > 179 return be16_to_cpu(((const uint16_t
> > > > *)refcount_array)[index]);
> > > > > (gdb) bt
> > > > > #0 get_refcount_ro4 (refcount_array=0x7fffc0edc010,
> > > > index=246458459629569)
> > > > > at block/qcow2-refcount.c:179
> > > > > #1 0x0000555555595851 in inc_refcounts
> > > > > (address@hidden,
> > > > > address@hidden,
> > > > > address@hidden, address@hidden
> > > > =2684354560,
> > > > > res=0x7fffffffd730, res=0x7fffffffd730, bs=0x555555c76320)
> > > > > at block/qcow2-refcount.c:1329
> > > > > #2 0x0000555555595a61 in check_refcounts_l1 (address@hidden
> > > > =0x555555c76320,
> > > > > address@hidden, address@hidden
> > > > > =0x7fffffffd690,
> > > > > address@hidden,
> > > > > l1_table_offset=-2294842463426117632, l1_size=335544320,
> > > > address@hidden
> > > > > =0)
> > > >
> > > > To avoid this crash, we should probably validate l1_table_offset
> > against
> > > > refcount_table_size in check_refcounts_l1.
> > > >
> > > > Regarding the image, apparently the l1 table offset doesn't make sense
> > > > here,
> > > > the header may be corrupted. Can you hexdump the first 512 bytes?
> > > >
> > > > > at block/qcow2-refcount.c:1487
> > > > > #3 0x0000555555595fcd in calculate_refcounts (address@hidden
> > > > =0x555555c76320,
> > > > > address@hidden, address@hidden(unknown: 0),
> > > > > address@hidden,
> > > > > address@hidden,
> > > > > address@hidden)
> > > > > at block/qcow2-refcount.c:1811
> > > > > #4 0x000055555559893e in qcow2_check_refcounts (address@hidden
> > > > =0x555555c76320,
> > > > > address@hidden, address@hidden(unknown: 0))
> > > > > at block/qcow2-refcount.c:2199
> > > > > #5 0x0000555555592d15 in qcow2_check (bs=0x555555c76320,
> > > > > result=0x7fffffffd730, fix=(unknown: 0)) at block/qcow2.c:336
> > > > > #6 0x0000555555568b2b in collect_image_check (bs=0x555555c76320,
> > > > > check=0x555555ca8e40,
> > > > > filename=0x7fffffffdc72
> > > > "/home/ivan/../vm-images/win81a.qcow2.broken",
> > > > > fix=<optimized out>, fmt=<optimized out>) at qemu-img.c:444
> > > > > #7 0x000055555556a53e in img_check (argc=<optimized out>,
> > > > argv=<optimized
> > > > > out>) at qemu-img.c:570
> > > > > #8 0x000055555556559c in main (argc=3, argv=0x7fffffffd938) at
> > > > > qemu-img.c:3087
> > > > > --
> > > > > Regards,
> > > > > Ivan
> > > > >
> > > > > On Thu, Nov 5, 2015 at 12:29 PM Fam Zheng <address@hidden> wrote:
> > > > >
> > > > > > On Thu, 11/05 01:09, Ivan Volosyuk wrote:
> > > > > > > Yesterday, I did a few tweaks for my system which uses VGA
> > > > passthrough.
> > > > > > > - I moved virtual CPUs to dedicated CPUs (isolcpus=4-7 kernel
> > boot
> > > > > > argument)
> > > > > > > - I instructed my windows guest to use MSI IRQ
> > > > > > >
> > > > > > > Sound crackles disappeared, but after a few minutes of playing
> > > > Starcraft
> > > > > > I
> > > > > > > got first ever reboot of windows 8.1 in virtual machine. When I
> > > > tried to
> > > > > > > restart the virtual machine I got the message that qcow2 image is
> > > > > > corrupted.
> > > > > > > qemu-img check crashes with segfault now on the image.
> > > > > >
> > > > > > Is the convinient for you to provide the image? If not, can you
> > post
> > > > the
> > > > > > backtrace of "qemu-img check" crash?
> > > > > >
> > > > > > Fam
> > > > > >
> > > >
> >