[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_i
From: |
Nicholas Piggin |
Subject: |
Re: [Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_ipi() |
Date: |
Fri, 29 Mar 2019 19:13:55 +1000 |
User-agent: |
astroid/0.14.0 (https://github.com/astroidmail/astroid) |
Suraj Jitindar Singh's on March 29, 2019 3:20 pm:
> On Wed, 2019-03-27 at 17:51 +0100, Cédric Le Goater wrote:
>> On 3/27/19 5:37 PM, Cédric Le Goater wrote:
>> > On 3/27/19 1:36 PM, Sebastian Andrzej Siewior wrote:
>> > > With qemu-system-ppc64le -machine pseries -smp 4 I get:
>> > >
>> > > > # chrt 1 hackbench
>> > > > Running in process mode with 10 groups using 40 file
>> > > > descriptors each (== 400 tasks)
>> > > > Each sender will pass 100 messages of 100 bytes
>> > > > Oops: Exception in kernel mode, sig: 4 [#1]
>> > > > LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA pSeries
>> > > > Modules linked in:
>> > > > CPU: 0 PID: 629 Comm: hackbench Not tainted 5.1.0-rc2 #71
>> > > > NIP: c000000000046978 LR: c000000000046a38 CTR:
>> > > > c0000000000b0150
>> > > > REGS: c0000001fffeb8e0 TRAP: 0700 Not tainted (5.1.0-rc2)
>> > > > MSR: 8000000000089033 <SF,EE,ME,IR,DR,RI,LE> CR:
>> > > > 42000874 XER: 00000000
>> > > > CFAR: c000000000046a34 IRQMASK: 1
>> > > > GPR00: c0000000000b0170 c0000001fffebb70 c000000000a6ba00
>> > > > 0000000028000000
>> > >
>> > > …
>> > > > NIP [c000000000046978] doorbell_core_ipi+0x28/0x30
>> > > > LR [c000000000046a38] doorbell_try_core_ipi+0xb8/0xf0
>> > > > Call Trace:
>> > > > [c0000001fffebb70] [c0000001fffebba0] 0xc0000001fffebba0
>> > > > (unreliable)
>> > > > [c0000001fffebba0] [c0000000000b0170]
>> > > > smp_pseries_cause_ipi+0x20/0x70
>> > > > [c0000001fffebbd0] [c00000000004b02c]
>> > > > arch_send_call_function_single_ipi+0x8c/0xa0
>> > > > [c0000001fffebbf0] [c0000000001de600]
>> > > > irq_work_queue_on+0xe0/0x130
>> > > > [c0000001fffebc30] [c0000000001340c8]
>> > > > rto_push_irq_work_func+0xc8/0x120
>> > >
>> > > …
>> > > > Instruction dump:
>> > > > 60000000 60000000 3c4c00a2 384250b0 3d220009 392949c8 81290000
>> > > > 3929ffff
>> > > > 7d231838 7c0004ac 5463017e 64632800 <7c00191c> 4e800020
>> > > > 3c4c00a2 38425080
>> > > > ---[ end trace eb842b544538cbdf ]---
This is unusual and causing powerpc code to crash because the rt
scheduler is telling irq_work_queue_on to queue work on this CPU.
Is that something allowed? There's no warnings in there but it must
be a rarely tested path, would it be better to ban it?
Steven is this queue_work_on to self by design?
>> > >
>> > > and I was wondering whether this is a qemu bug or the kernel is
>> > > using an
>> > > opcode it should rather not. If I skip doorbell_try_core_ipi() in
>> > > smp_pseries_cause_ipi() then there is no crash. The comment says
>> > > "POWER9
>> > > should not use this handler" so…
>> >
>> > I would say Linux is using a msgsndp instruction which is not
>> > implemented
>> > in QEMU TCG. But why have we started using dbells in Linux ?
>
> Yeah the kernel must have used msgsndp which isn't implemented for TCG
> yet. We use doorbells in linux but only for threads which are on the
> same core.
> And when I try to construct a situation with more than 1 thread per
> core (e.g. -smp 4,threads=4), I get "TCG cannot support more than 1
> thread/core on a pseries machine".
>
> So I wonder why the guest thinks it can use msgsndp...
IPI to self evidently. Under TCG it really should implement the
instruction or remove the DBELL feature.
Thanks,
Nick