[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-discuss] Kernel panic in VMs with large amounts ofmemory(>1TB)
From: |
Burkhard Linke |
Subject: |
Re: [Qemu-discuss] Kernel panic in VMs with large amounts ofmemory(>1TB) |
Date: |
Wed, 6 Dec 2017 16:39:32 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 |
Hi,
On 11/30/2017 11:31 AM, Alberto Garcia wrote:
On Thu, Nov 30, 2017 at 09:43:04AM +0100, Burkhard Linke wrote:
VMs are running fine with less or equal 1 TB RAM. More RAM results
in a kernel panic during VM boot:
You need to patch QEMU:
https://git.centos.org/blob/rpms!!qemu-kvm.git/34b32196890e2c41b0aee042e600ba422f29db17/SOURCES!kvm-seabios-paravirt-allow-more-than-1TB-in-x86-guest.patch
https://git.centos.org/blob/rpms!!qemu-kvm.git/34b32196890e2c41b0aee042e600ba422f29db17/SOURCES!kvm-fix-guest-physical-bits-to-match-host-to-go-beyond-1.patch
And SeaBIOS:
https://git.centos.org/blob/rpms!!seabios.git/62d8d852f4675e4ab4bc3dd339050d26d397c251/SOURCES!0002-allow-1TB-of-RAM.patch
Thanks for the patches. I've applied them, and instances are able to
start now, but if the workload is raised (memtester with +1TB and
building kernel with 30 processes in parallel), both the VM and the
hypervisor freeze:
2017-12-06T15:29:59.988085+00:00 dl580-r2-1 kernel: [ 1531.823823] NMI
watchdog: BUG: soft lockup - CPU#58 stuck for 22s! [qemu-system-x86:22992]
2017-12-06T15:29:59.988099+00:00 dl580-r2-1 kernel: [ 1531.823826]
Modules linked in: vhost_net vhost macvtap macvlan ebtable_filter
ebtables vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch
nf_nat_ipv6 nf_nat_ipv4 nf_nat 8021q garp mrp bridge stp llc bonding
ip6table_filter ip6_tables xt_CT iptable_raw xt_comment xt_multiport
xt_conntrack iptable_filter ip_tables x_tables xfs intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ipmi_ssif
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate
intel_rapl_perf hpilo lpc_ich ioatdma dca ipmi_si ipmi_devintf shpchp
ipmi_msghandler acpi_power_meter mac_hid kvm_intel kvm irqbypass ib_iser
rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi nf_conntrack_proto_gre nf_conntrack_ipv6
2017-12-06T15:29:59.988102+00:00 dl580-r2-1 kernel: [ 1531.823864]
nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack autofs4
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq raid1 raid0 multipath linear i2c_algo_bit ttm
drm_kms_helper bnx2x syscopyarea sysfillrect sysimgblt fb_sys_fops drm
ptp hpsa pps_core mdio scsi_transport_sas libcrc32c wmi fjes scsi_dh_emc
scsi_dh_rdac scsi_dh_alua dm_multipath
2017-12-06T15:29:59.988104+00:00 dl580-r2-1 kernel: [ 1531.823888] CPU:
58 PID: 22992 Comm: qemu-system-x86 Tainted: G L
4.10.0-40-generic #44~16.04.1-Ubuntu
2017-12-06T15:29:59.988105+00:00 dl580-r2-1 kernel: [ 1531.823890]
Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17
09/12/2016
2017-12-06T15:29:59.988107+00:00 dl580-r2-1 kernel: [ 1531.823891] task:
ffff948c05f50000 task.stack: ffffac622683c000
2017-12-06T15:29:59.988108+00:00 dl580-r2-1 kernel: [ 1531.823895] RIP:
0010:native_queued_spin_lock_slowpath+0x118/0x1a0
2017-12-06T15:29:59.988109+00:00 dl580-r2-1 kernel: [ 1531.823898] RSP:
0018:ffffac622683fcb0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
2017-12-06T15:29:59.988111+00:00 dl580-r2-1 kernel: [ 1531.823900] RAX:
0000000000000000 RBX: 00000002520d9d00 RCX: ffff930c7fa19f00
2017-12-06T15:29:59.988112+00:00 dl580-r2-1 kernel: [ 1531.823901] RDX:
ffff93cc7fad9f00 RSI: 0000000001300101 RDI: ffff948c0b388000
2017-12-06T15:29:59.988114+00:00 dl580-r2-1 kernel: [ 1531.823903] RBP:
ffffac622683fcb0 R08: 0000000000ec0000 R09: 0000000000000000
2017-12-06T15:29:59.988149+00:00 dl580-r2-1 kernel: [ 1531.823904] R10:
00000000ffffffff R11: 0000000000000000 R12: ffff930c0a7b8000
2017-12-06T15:29:59.988151+00:00 dl580-r2-1 kernel: [ 1531.823905] R13:
ffffac622683fcd0 R14: 0000000000000001 R15: ffff930c0213d500
2017-12-06T15:29:59.988152+00:00 dl580-r2-1 kernel: [ 1531.823906] FS:
00007f8fff7fe700(0000) GS:ffff930c7fa00000(0000) knlGS:0000000000000000
2017-12-06T15:29:59.988154+00:00 dl580-r2-1 kernel: [ 1531.823907] CS:
0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2017-12-06T15:29:59.988155+00:00 dl580-r2-1 kernel: [ 1531.823909] CR2:
00007fcc46332ab0 CR3: 0000023d1851f000 CR4: 00000000003426e0
2017-12-06T15:29:59.988157+00:00 dl580-r2-1 kernel: [ 1531.823909] Call
Trace:
2017-12-06T15:29:59.988158+00:00 dl580-r2-1 kernel: [ 1531.823913]
_raw_spin_lock+0x20/0x30
2017-12-06T15:29:59.988160+00:00 dl580-r2-1 kernel: [ 1531.823933]
mmu_free_roots+0x11c/0x170 [kvm]
2017-12-06T15:29:59.988161+00:00 dl580-r2-1 kernel: [ 1531.823949]
kvm_mmu_unload+0x12/0x40 [kvm]
2017-12-06T15:29:59.988162+00:00 dl580-r2-1 kernel: [ 1531.823965]
vcpu_enter_guest+0x42a/0x11b0 [kvm]
2017-12-06T15:29:59.988164+00:00 dl580-r2-1 kernel: [ 1531.823971] ?
vmx_sync_pir_to_irr+0x29/0x30 [kvm_intel]
2017-12-06T15:29:59.988165+00:00 dl580-r2-1 kernel: [ 1531.823989] ?
kvm_apic_has_interrupt+0x98/0xc0 [kvm]
2017-12-06T15:29:59.988167+00:00 dl580-r2-1 kernel: [ 1531.824006]
kvm_arch_vcpu_ioctl_run+0xc8/0x3e0 [kvm]
2017-12-06T15:29:59.988168+00:00 dl580-r2-1 kernel: [ 1531.824021]
kvm_vcpu_ioctl+0x33a/0x600 [kvm]
2017-12-06T15:29:59.988170+00:00 dl580-r2-1 kernel: [ 1531.824023] ?
do_futex+0x1fb/0x540
2017-12-06T15:29:59.988171+00:00 dl580-r2-1 kernel: [ 1531.824026]
do_vfs_ioctl+0xa1/0x5f0
2017-12-06T15:29:59.988173+00:00 dl580-r2-1 kernel: [ 1531.824043] ?
kvm_on_user_return+0x66/0xa0 [kvm]
2017-12-06T15:29:59.988174+00:00 dl580-r2-1 kernel: [ 1531.824046]
SyS_ioctl+0x79/0x90
2017-12-06T15:29:59.988176+00:00 dl580-r2-1 kernel: [ 1531.824050]
entry_SYSCALL_64_fastpath+0x1e/0xad
2017-12-06T15:29:59.988177+00:00 dl580-r2-1 kernel: [ 1531.824051] RIP:
0033:0x7f916c767f07
2017-12-06T15:29:59.988178+00:00 dl580-r2-1 kernel: [ 1531.824052] RSP:
002b:00007f8fff7fd938 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
2017-12-06T15:29:59.988180+00:00 dl580-r2-1 kernel: [ 1531.824054] RAX:
ffffffffffffffda RBX: 00007f914c034001 RCX: 00007f916c767f07
2017-12-06T15:29:59.988181+00:00 dl580-r2-1 kernel: [ 1531.824055] RDX:
0000000000000000 RSI: 000000000000ae80 RDI: 000000000000005f
2017-12-06T15:29:59.988183+00:00 dl580-r2-1 kernel: [ 1531.824056] RBP:
0000000000000001 R08: 000055a6393346b0 R09: 00000000000000ff
2017-12-06T15:29:59.988184+00:00 dl580-r2-1 kernel: [ 1531.824057] R10:
0000000000000001 R11: 0000000000000246 R12: 0000000000000000
2017-12-06T15:29:59.988186+00:00 dl580-r2-1 kernel: [ 1531.824058] R13:
000055a63931f2c0 R14: 00007f914c033000 R15: 000055a63ba27ee0
2017-12-06T15:29:59.988187+00:00 dl580-r2-1 kernel: [ 1531.824059] Code:
12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 00 9f 01 00 48 03 14 c5
e0 83 14 9e 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7
4c 8b 09 4d 85 c9 74 08 41 0f 0d 09 eb 02 f3 90 8b
The hypervisor was running kernel linux-image-4.10.0-40-generic in this
test; the stock xenial 4.4.X kernels also show a similar behavior with a
similar trace. The VM in question is using the current xenial cloud image.
Any hints on this, too?
Best regards,
Burkhard Linke
--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Qemu-discuss] Kernel panic in VMs with large amounts ofmemory(>1TB),
Burkhard Linke <=