[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x
From: |
Sage Weil |
Subject: |
Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686] |
Date: |
Tue, 13 Aug 2013 14:26:07 -0700 (PDT) |
User-agent: |
Alpine 2.00 (DEB 1167 2008-08-23) |
On Mon, 5 Aug 2013, Mike Dawson wrote:
> Josh,
>
> Logs are uploaded to cephdrop with the file name mikedawson-rbd-qemu-deadlock.
>
> - At about 2013-08-05 19:46 or 47, we hit the issue, traffic went to 0
> - At about 2013-08-05 19:53:51, ran a 'virsh screenshot'
>
>
> Environment is:
>
> - Ceph 0.61.7 (client is co-mingled with three OSDs)
> - rbd cache = true and cache=writeback
> - qemu 1.4.0 1.4.0+dfsg-1expubuntu4
> - Ubuntu Raring with 3.8.0-25-generic
>
> This issue is reproducible in my environment, and I'm willing to run any wip
> branch you need. What else can I provide to help?
This looks like a different issue than Oliver's. I see one anomaly in the
log, where a rbd io completion is triggered a second time for no apparent
reason. I opened a separate bug
http://tracker.ceph.com/issues/5955
and pushed wip-5955 that will hopefully shine some light on the weird
behavior I saw. Can you reproduce with this branch and
debug objectcacher = 20
debug ms = 1
debug rbd = 20
debug finisher = 20
Thanks!
sage
>
> Thanks,
> Mike Dawson
>
>
> On 8/5/2013 3:48 AM, Stefan Hajnoczi wrote:
> > On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:
> > > Am 02.08.2013 um 23:47 schrieb Mike Dawson <address@hidden>:
> > > > We can "un-wedge" the guest by opening a NoVNC session or running a
> > > > 'virsh screenshot' command. After that, the guest resumes and runs as
> > > > expected. At that point we can examine the guest. Each time we'll see:
> >
> > If virsh screenshot works then this confirms that QEMU itself is still
> > responding. Its main loop cannot be blocked since it was able to
> > process the screendump command.
> >
> > This supports Josh's theory that a callback is not being invoked. The
> > virtio-blk I/O request would be left in a pending state.
> >
> > Now here is where the behavior varies between configurations:
> >
> > On a Windows guest with 1 vCPU, you may see the symptom that the guest no
> > longer responds to ping.
> >
> > On a Linux guest with multiple vCPUs, you may see the hung task message
> > from the guest kernel because other vCPUs are still making progress.
> > Just the vCPU that issued the I/O request and whose task is in
> > UNINTERRUPTIBLE state would really be stuck.
> >
> > Basically, the symptoms depend not just on how QEMU is behaving but also
> > on the guest kernel and how many vCPUs you have configured.
> >
> > I think this can explain how both problems you are observing, Oliver and
> > Mike, are a result of the same bug. At least I hope they are :).
> >
> > Stefan
> >
> _______________________________________________
> ceph-users mailing list
> address@hidden
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], Oliver Francke, 2013/08/04
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], Stefan Hajnoczi, 2013/08/05
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], Mike Dawson, 2013/08/05
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686],
Sage Weil <=
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], James Harper, 2013/08/13
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], Oliver Francke, 2013/08/08
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], Josh Durgin, 2013/08/08
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], Oliver Francke, 2013/08/09
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], Andrei Mikhailovsky, 2013/08/09
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], Stefan Hajnoczi, 2013/08/09
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], Josh Durgin, 2013/08/10
- Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686], Sage Weil, 2013/08/13