[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-stable] [Qemu-devel] Data corruption in Qemu 2.7.1
From: |
Fabian Grünbichler |
Subject: |
Re: [Qemu-stable] [Qemu-devel] Data corruption in Qemu 2.7.1 |
Date: |
Wed, 18 Jan 2017 17:19:41 +0100 |
User-agent: |
NeoMutt/20161126 (1.7.1) |
On Wed, Jan 18, 2017 at 12:50:50PM +0100, Fabian Grünbichler wrote:
> On 17/01/2017 16:03, Paolo Bonzini wrote:
> > On 17/01/2017 12:22, Fabian Grünbichler wrote:
> >> 6) repeat 3-5 until md5sum does not match, kernel spews error
> >> messages, or you are convinced that everything is OK
> >>
> >> sample kernel message (for ext3):
> >> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 FAILED Result:
> >> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> >> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Sense Key :
> >> Illegal Request [current]
> >> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Add. Sense:
> >> Invalid field in cdb
> >> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 CDB: Write(10) 2a
> >> 00 0f 3a 90 00 00 07 d8 00
> >> Jan 17 11:39:32 ubuntu kernel: blk_update_request: critical target error,
> >> dev sda, sector 255496192
> >
> > Can you reproduce it if QEMU runs under "strace -e ioctl -ff" in the
> > host? Or also using this systemtap script.
> >
> > The important bit would be the lines with a nonzero status, but the
> > others can be useful to see what the surroundings look like.
> >
>
> OT: systemtap is not working with your script under Debian Jessie (or
> maybe in general under Debian Jessie? not sure).
>
> after some further testing it seems like this change in Qemu exposes
> some subtle issue with our specific kernel (it works fine with the
> upstream Ubuntu 4.4 one which ours is based on). I am currently
> debugging further to narrow down potential causes - if I need further
> input from your side or if I suspect Qemu to be at fault I'll resurrect
> this thread (and include the strace output).
>
> thanks for your quick reaction anyhow!
>
okay, so this looks like either a bug in Qemu or the upstream kernel.
disabling THP on the hypervisor host with
# echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
allows reproducing the bug very reliably, shutting the VM down, then
enabling THP (with 'always') and trying again makes it go away.
Qemu was compiled with:
../configure --with-confsuffix=/kvm --target-list=x86_64-softmmu
--disable-xen --enable-gnutls --enable-sdl --enable-uuid
--enable-linux-aio --enable-libiscsi --disable-smartcard
--audio-drv-list=alsa --enable-spice --enable-usb-redir --enable-libusb
--disable-gtk --enable-xfsctl --enable-numa --disable-strip
--enable-jemalloc --disable-libnfs --disable-fdt
attached is an strace with qemu master and mainline 4.9 running on
Debian Jessie - I will try to test it with Fedora or CentOS tomorrow.
journal in the VM says the following:
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Sense Key : Illegal
Request [current]
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Add. Sense: Invalid
field in cdb
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 CDB: Write(10) 2a 00
0d d6 51 48 00 08 00 00
Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev
sda, sector 232149320
Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329:
I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29018921)
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block
29018409
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block
29018410
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block
29018411
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block
29018412
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block
29018413
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block
29018414
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block
29018415
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block
29018416
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block
29018417
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block
29018418
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Sense Key : Illegal
Request [current]
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Add. Sense: Invalid
field in cdb
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 CDB: Write(10) 2a 00
0d d6 59 48 00 08 00 00
Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev
sda, sector 232151368
Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329:
I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29019177)
Jan 18 17:07:52 ubuntu kernel: JBD2: Detected IO errors while flushing file
data on sda1-8
Jan 18 17:07:58 ubuntu kernel: JBD2: Detected IO errors while flushing file
data on sda1-8
strace (with some random grep-ing):
[pid 1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6,
51, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=17, dxfer_len=1048576,
timeout=4294967295, flags=0x1,
data[1048576]=["\0`\235=c\177\0\0\0\0\1\0\0\0\0\0\0`\236=c\177\0\0\0\0\1\0\0\0\0\0"...]})
= -1 EINVAL (Invalid argument)
[pid 1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6,
59, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=16, dxfer_len=1048576,
timeout=4294967295, flags=0x1,
data[1048576]=["\0`-=c\177\0\0\0\0\1\0\0\0\0\0\0`.=c\177\0\0\0\0\1\0\0\0\0\0"...]})
= -1 EINVAL (Invalid argument)
host-strace.gz
Description: application/gzip
- [Qemu-stable] Data corruption in Qemu 2.7.1, Peter Lieven, 2017/01/13
- Re: [Qemu-stable] [Qemu-devel] Data corruption in Qemu 2.7.1, Fam Zheng, 2017/01/17
- Re: [Qemu-stable] [Qemu-devel] Data corruption in Qemu 2.7.1, Alexandre DERUMIER, 2017/01/17
- Re: [Qemu-stable] [Qemu-devel] Data corruption in Qemu 2.7.1, Fabian Grünbichler, 2017/01/17
- Re: [Qemu-stable] Data corruption in Qemu 2.7.1, Paolo Bonzini, 2017/01/17
- Re: [Qemu-stable] Data corruption in Qemu 2.7.1, Fabian Grünbichler, 2017/01/17
- Re: [Qemu-stable] Data corruption in Qemu 2.7.1, Paolo Bonzini, 2017/01/17
- Re: [Qemu-stable] Data corruption in Qemu 2.7.1, Paolo Bonzini, 2017/01/17
- Re: [Qemu-stable] Data corruption in Qemu 2.7.1, Fabian Grünbichler, 2017/01/18
- Re: [Qemu-stable] [Qemu-devel] Data corruption in Qemu 2.7.1,
Fabian Grünbichler <=
- Re: [Qemu-stable] [Qemu-devel] Data corruption in Qemu 2.7.1, Paolo Bonzini, 2017/01/18
- Re: [Qemu-stable] [Qemu-devel] Data corruption in Qemu 2.7.1, Fabian Grünbichler, 2017/01/18
- Re: [Qemu-stable] [Qemu-devel] Data corruption in Qemu 2.7.1, Fabian Grünbichler, 2017/01/19
- Re: [Qemu-stable] [Qemu-devel] Data corruption in Qemu 2.7.1, Paolo Bonzini, 2017/01/24