qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RHEL 8.1 Oops on qemu-ppc-for-4.2


From: David Gibson
Subject: Re: RHEL 8.1 Oops on qemu-ppc-for-4.2
Date: Tue, 17 Dec 2019 11:38:05 +1100

On Tue, Dec 10, 2019 at 12:14:40PM -0600, Paul Clarke wrote:
> I'm using RHEL 8.1 with TCG on a very recent qemu-ppc-for-4.2. HEAD:
> --
> commit 9b4efa2ede5db24377405a21b218066b90fe2f0e
> Date:   Mon Dec 9 16:06:51 2019 +0000

>     Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-12-09' 
> into staging
> --

> It generally runs well for some time, but then I get an Oops at random (or 
> possibly when idle for some time):
> --
> [O[11316.307334] Oops: Exception in kernel mode, sig: 4 [#1]
> [11316.316415] LE SMP NR_CPUS=2048 NUMA pSeries
> [11316.319519] Modules linked in: nf_tables_set nft_fib_inet nft_fib_ipv4 
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject
> +nft_ct nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 
> nft_chain_route_ipv6 nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 
> nf_nat_ipv4
> +nf_nat nft_chain_route_ipv4 nf_conntrack ip6_tables ip_tables nft_compat 
> ip_set nf_tables nfnetlink xts sg vmx_crypto virtio_balloon virtio_console 
> isofs
> +xfs libcrc32c sr_mod cdrom bochs_drm drm_kms_helper syscopyarea sysfillrect 
> sysimgblt fb_sys_fops ttm drm virtio_net drm_panel_orientation_quirks 
> virtio_blk
> +net_failover failover virtio_scsi
> [11316.331431] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 4.18.0-147.el8.ppc64le #1
> [11316.332660] NIP:  c0000000000515ec LR: c0000000000fa778 CTR: 
> c0000000000fa750
> [11316.333655] REGS: c00000000166f900 TRAP: 0700   Not tainted  
> (4.18.0-147.el8.ppc64le)
> [11316.334536] MSR:  800000000288b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 
> 44000822  XER: 00000000
> [11316.335802] CFAR: c0000000000515a0 IRQMASK: 1
> [11316.335802] GPR00: c0000000000fa778 c00000000166fb80 c000000001672900 
> 0000000028000000
> [11316.335802] GPR04: c0000000016a1fe8 0000000000000000 c0000000ffffc000 
> 0000000000000001
> [11316.335802] GPR08: c000000001a90000 0000000000000000 0000000000000001 
> c00000017fdc3100
> [11316.335802] GPR12: c0000000000fa750 c000000001a90000 0000000002f66f60 
> c00000017fdd8980
> [11316.335802] GPR16: 00000000030324b0 c0000000015ae199 0000000000000000 
> 0000000000000004
> [11316.335802] GPR20: 0000000000000000 c00000000160b9a8 0000000000000001 
> c00000017fdd8a00
> [11316.335802] GPR24: c00000000166fd90 c000000000d45188 c0000000016a2350 
> 0000000000000002
> [11316.335802] GPR28: c000000001839100 c0000000016a24b4 0000000000000000 
> 0000000000000000
> [11316.342836] NIP [c0000000000515ec] doorbell_try_core_ipi+0x9c/0xb0
> [11316.343482] LR [c0000000000fa778] smp_pseries_cause_ipi+0x28/0x70
> [11316.344093] Call Trace:
> [11316.344511] [c00000000166fb80] [c00000000166fc20] init_stack+0x3c20/0x4000 
> (unreliable)
> [11316.345193] [c00000000166fbb0] [c000000000056d3c] 
> smp_send_reschedule+0xac/0xc0
> [11316.345750] [c00000000166fbd0] [c00000000019a5b4] kick_ilb+0x124/0x150
> [11316.346247] [c00000000166fc20] [c0000000001ada04] 
> pick_next_task_fair+0x7f4/0x840
> [11316.346822] [c00000000166fd30] [c000000000d4429c] __schedule+0x15c/0xb20
> [11316.347376] [c00000000166fe00] [c000000000d45188] schedule_idle+0x38/0x70
> [11316.347866] [c00000000166fe20] [c000000000199144] do_idle+0x274/0x480
> [11316.348323] [c00000000166fea0] [c00000000019958c] 
> cpu_startup_entry+0x3c/0x40
> [11316.348813] [c00000000166fed0] [c0000000000103d8] rest_init+0xe0/0xf8
> [11316.349281] [c00000000166ff00] [c0000000010b4228] start_kernel+0x638/0x658
> [11316.349786] [c00000000166ff90] [c00000000000ad7c] 
> start_here_common+0x1c/0x520
> [11316.350340] Instruction dump:
> [11316.350755] 3929dac0 78681f24 38e00001 e8c60000 81290000 7d06402a 3929ffff 
> 7d231838
> [11316.351331] 98e81025 7c0004ac 5463017e 64632800 <7c00191c> 7d435378 
> 4e800020 60000000
> [11316.352386] ---[ end trace af78a85a10eb57cc ]---
> [11316.361536]
> [11317.362388] Kernel panic - not syncing: Fatal exception
> --

> Coincidentally, this was up for about pi (3.14) hours.  Before that, I had 
> pulled and rebuilt because I was seeing Oopses, so I don't think the Oops is
> +related to a recent change.

> Let me know how I can help figure out what's going on.

So we're getting an illegal instruction trap on 0x7c00191c which is
msgsndp instruction.  We haven't yet implemented msgsndp in TCG (I
have patches in my queue to review).

It's kind of weird that it happens so rarely.  It suggests either that
we're using msgsndp very rarely (in which case it's not clear that
it's worth the bother) or that something is already supposed to be
disabling its use, but we've missed some edge case.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]