[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] qemu-system-ppc64 hanging occasionally in disk writes
From: |
Alexander Graf |
Subject: |
Re: [Qemu-ppc] qemu-system-ppc64 hanging occasionally in disk writes |
Date: |
Thu, 14 Jun 2012 23:34:37 +0200 |
On 14.06.2012, at 19:13, "Richard W.M. Jones" <address@hidden> wrote:
> On Thu, Jun 14, 2012 at 05:58:04PM +0200, Alexander Graf wrote:
>> [CC'ing qemu-ppc]
>>
>> On 06/14/2012 05:52 PM, Richard W.M. Jones wrote:
>>> I found last week that qemu-system-ppc64 (from git) hangs occasionally
>>> under load, and I have a reproducer for it now. Unfortunately the
>>> reproducer really takes a long time to run -- usually I can get a hang
>>> in under 12 hours.
>>>
>>> Here is the reproducer case:
>>>
>>> https://lists.fedoraproject.org/pipermail/ppc/2012-June/001698.html
>>>
>>> Notes:
>>>
>>> (1) Verified by one other person (other than me). Happens on both
>>> ppc64 and x86-64 host.
>>>
>>> (2) Happens with both Fedora guest kernel 3.3.4-5.fc17.ppc64 and kernel
>>> 3.5.0 that I compiled myself. The test case above contains 3.3.4-5.
>>>
>>> (3) Seems to be a problem in qemu, not the guest. The reason I think
>>> this is because I tried to capture a backtrace of the hang using
>>> remote gdb, but gdb just hung when trying to connect to qemu
>>> (gdb connects fine before the bug happens).
>>>
>>> (4) Judging by guest messages, appears to be happening when writing
>>> to the disk.
>>
>> Can you please try to see if you can repdudice this using vscsi /
>> vio instead of virtio? I couldn't quite see why vio would be any
>> more stable than virtio though ...
>
> I just tried virtio-scsi, but only the first disk shows up. I added
> two disks. See below for detailed logs. This works fine on x86-64.
> Should I file a separate bug for this?
>
>> Also, could you please try and see if it works reliably using KVM?
>> Maybe we're just encountering some TCG breakage here.
>
> I will try this, but as discussed on IRC last week there's some
> problem with the Fedora host kernel where /dev/kvm doesn't show up,
> even though the kernel is supposedly compiled with KVM PR enabled. So
> I need to fix that first.
>
> Rich.
>
> virtio scsi on ppc64
> --------------------
>
> qemu command line:
>
> /home/rjones/d/qemu/ppc64-softmmu/qemu-system-ppc64 \
> -global virtio-blk-pci.scsi=off \
> -nodefconfig \
> -nodefaults \
> -nographic \
> -device virtio-scsi-pci,id=scsi \
> -drive file=test1.img,cache=off,format=raw,id=hd0,if=none \
> -device scsi-hd,drive=hd0 \
Don't you have to specify bus= too?
Alex
> -drive
> file=/home/rjones/d/libguestfs/.guestfs-1000/root.26645,snapshot=on,id=appliance,if=none,cache=unsafe
> \
> -device scsi-hd,drive=appliance \
> -M pseries \
> -enable-kvm \
> -machine accel=kvm:tcg \
> -m 500 \
> -no-reboot \
> -device virtio-serial \
> -serial stdio \
> -chardev
> socket,path=/home/rjones/d/libguestfs/libguestfscoRCTO/guestfsd.sock,id=channel0
> \
> -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
> -kernel /home/rjones/d/libguestfs/.guestfs-1000/kernel.26645 \
> -initrd /home/rjones/d/libguestfs/.guestfs-1000/initrd.26645 \
> -append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off
> printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1
> TERM=screen '
>
> guest kernel output:
>
> Welcome to Open Firmware
>
> Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
> This program and the accompanying materials are made available
> under the terms of the BSD License available at
> http://www.opensource.org/licenses/bsd-license.php
>
> Booting from memory...
> OF stdout device is: /vdevice/address@hidden
> Preparing to boot Linux version 3.3.4-5.fc17.ppc64 (address@hidden) (gcc
> version 4.7.0 20120504 (Red Hat 4.7.0-4) (GCC) ) #1 SMP Mon May 14 10:18:37
> MST 2012
> Detected machine type: 0000000000000101
> Max number of cores passed to firmware: 1024 (NR_CPUS = 1024)
> Calling ibm,client-architecture-support... not implemented
> couldn't open /packages/elf-loader
> command line: panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off
> printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1
> TERM=screen
> memory layout at init:
> memory_limit : 0000000000000000 (16 MB aligned)
> alloc_bottom : 0000000001a50000
> alloc_top : 000000001f400000
> alloc_top_hi : 000000001f400000
> rmo_top : 000000001f400000
> ram_top : 000000001f400000
> instantiating rtas at 0x000000001cff0000... done
> Querying for OPAL presence... not there.
> boot cpu hw idx 0
> copying OF device tree...
> Building dt strings...
> Building dt structure...
> Device tree strings 0x0000000001c60000 -> 0x0000000001c605e0
> Device tree struct 0x0000000001c70000 -> 0x0000000001c80000
> Calling quiesce...
> returning from prom_init
> [ 0.000000] Phyp-dump not supported on this hardware
> [ 0.000000] Using pSeries machine description
> [ 0.000000] Using 1TB segments
> [ 0.000000] Found initrd at 0xc000000001a50000:0xc000000001b7c400
> [ 0.000000] bootconsole [udbg0] enabled
> [ 0.000000] CPU maps initialized for 1 thread per core
> [ 0.000000] Starting Linux PPC64 #1 SMP Mon May 14 10:18:37 MST 2012
> [ 0.000000] -----------------------------------------------------
> [ 0.000000] ppc64_pft_size = 0x18
> [ 0.000000] physicalMemorySize = 0x1f400000
> [ 0.000000] htab_hash_mask = 0x1ffff
> [ 0.000000] -----------------------------------------------------
> [ 0.000000] Initializing cgroup subsys cpuset
> [ 0.000000] Initializing cgroup subsys cpu
> [ 0.000000] Linux version 3.3.4-5.fc17.ppc64 (address@hidden) (gcc version
> 4.7.0 20120504 (Red Hat 4.7.0-4) (GCC) ) #1 SMP Mon May 14 10:18:37 MST 2012
>
> CF000012
> Setup Arch[ 0.000000] [boot]0012 Setup Arch
> [ 0.000000] PCI host bridge /address@hidden,0 ranges:
> [ 0.000000] IO 0x0000010080000000..0x000001008000ffff ->
> 0x0000000000000000
> [ 0.000000] MEM 0x00000100a0000000..0x00000100bfffffff ->
> 0x0000000080000000
> [ 0.000000] Zone PFN ranges:
> [ 0.000000] DMA 0x00000000 -> 0x00001f40
> [ 0.000000] Normal empty
> [ 0.000000] Movable zone start PFN for each node
> [ 0.000000] Early memory PFN ranges
> [ 0.000000] 0: 0x00000000 -> 0x00001f40
>
> CF000015
> Setup Done[ 0.000000] [boot]0015 Setup Done
> [ 0.000000] PERCPU: Embedded 2 pages/cpu @c000000001d00000 s84608 r0
> d46464 u1048576
> [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total
> pages: 7993
> [ 0.000000] Policy zone: DMA
> [ 0.000000] Kernel command line: panic=1 console=ttyS0 udevtimeout=600
> no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb
> selinux=0 guestfs_verbose=1 TERM=screen
> [ 0.000000] Disabling memory control group subsystem
> [ 0.000000] PID hash table entries: 2048 (order: -2, 16384 bytes)
> [ 0.000000] freeing bootmem node 0
> [ 0.000000] Memory: 486336k/512000k available (17920k kernel code, 25664k
> reserved, 1856k data, 2952k bss, 6656k init)
> [ 0.000000] SLUB: Genslabs=19, HWalign=128, Order=0-3, MinObjects=0,
> CPUs=1, Nodes=256
> [ 0.000000] Hierarchical RCU implementation.
> [ 0.000000] NR_IRQS:512 nr_irqs:512 16
> [ 0.000000] clocksource: timebase mult[1f40000] shift[24] registered
> [ 0.000000] Console: colour dummy device 80x25
> [ 0.000000] Phyp-dump not supported on this hardware
> [ 0.000000] Using pSeries machine description
> [ 0.000000] Using 1TB segments
> [ 0.000000] Found initrd at 0xc000000001a50000:0xc000000001b7c400
> [ 0.000000] bootconsole [udbg0] enabled
> [ 0.000000] CPU maps initialized for 1 thread per core
> [ 0.000000] Starting Linux PPC64 #1 SMP Mon May 14 10:18:37 MST 2012
> [ 0.000000] -----------------------------------------------------
> [ 0.000000] ppc64_pft_size = 0x18
> [ 0.000000] physicalMemorySize = 0x1f400000
> [ 0.000000] htab_hash_mask = 0x1ffff
> [ 0.000000] -----------------------------------------------------
> [ 0.000000] Initializing cgroup subsys cpuset
> [ 0.000000] Initializing cgroup subsys cpu
> [ 0.000000] Linux version 3.3.4-5.fc17.ppc64 (address@hidden) (gcc version
> 4.7.0 20120504 (Red Hat 4.7.0-4) (GCC) ) #1 SMP Mon May 14 10:18:37 MST 2012
> [ 0.000000] [boot]0012 Setup Arch
> [ 0.000000] PCI host bridge /address@hidden,0 ranges:
> [ 0.000000] IO 0x0000010080000000..0x000001008000ffff ->
> 0x0000000000000000
> [ 0.000000] MEM 0x00000100a0000000..0x00000100bfffffff ->
> 0x0000000080000000
> [ 0.000000] Zone PFN ranges:
> [ 0.000000] DMA 0x00000000 -> 0x00001f40
> [ 0.000000] Normal empty
> [ 0.000000] Movable zone start PFN for each node
> [ 0.000000] Early memory PFN ranges
> [ 0.000000] 0: 0x00000000 -> 0x00001f40
> [ 0.000000] [boot]0015 Setup Done
> [ 0.000000] PERCPU: Embedded 2 pages/cpu @c000000001d00000 s84608 r0
> d46464 u1048576
> [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total
> pages: 7993
> [ 0.000000] Policy zone: DMA
> [ 0.000000] Kernel command line: panic=1 console=ttyS0 udevtimeout=600
> no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb
> selinux=0 guestfs_verbose=1 TERM=screen
> [ 0.000000] Disabling memory control group subsystem
> [ 0.000000] PID hash table entries: 2048 (order: -2, 16384 bytes)
> [ 0.000000] freeing bootmem node 0
> [ 0.000000] Memory: 486336k/512000k available (17920k kernel code, 25664k
> reserved, 1856k data, 2952k bss, 6656k init)
> [ 0.000000] SLUB: Genslabs=19, HWalign=128, Order=0-3, MinObjects=0,
> CPUs=1, Nodes=256
> [ 0.000000] Hierarchical RCU implementation.
> [ 0.000000] NR_IRQS:512 nr_irqs:512 16
> [ 0.000000] clocksource: timebase mult[1f40000] shift[24] registered
> [ 0.000000] Console: colour dummy device 80x25
> [ 0.000000] console [hvc0] enabled
> [ 0.000000] console [hvc0] enabled
> [ 0.041700] pid_max: default: 32768 minimum: 301
> [ 0.041700] pid_max: default: 32768 minimum: 301
> [ 0.048107] Security Framework initialized
> [ 0.048107] Security Framework initialized
> [ 0.067154] SELinux: Disabled at boot.
> [ 0.067154] SELinux: Disabled at boot.
> [ 0.084262] Dentry cache hash table entries: 65536 (order: 3, 524288 bytes)
> [ 0.084262] Dentry cache hash table entries: 65536 (order: 3, 524288 bytes)
> [ 0.099618] Inode-cache hash table entries: 32768 (order: 2, 262144 bytes)
> [ 0.099618] Inode-cache hash table entries: 32768 (order: 2, 262144 bytes)
> [ 0.107083] Mount-cache hash table entries: 4096
> [ 0.107083] Mount-cache hash table entries: 4096
> [ 0.155933] Initializing cgroup subsys cpuacct
> [ 0.155933] Initializing cgroup subsys cpuacct
> [ 0.156562] Initializing cgroup subsys memory
> [ 0.156562] Initializing cgroup subsys memory
> [ 0.161423] Initializing cgroup subsys devices
> [ 0.161423] Initializing cgroup subsys devices
> [ 0.162250] Initializing cgroup subsys freezer
> [ 0.162250] Initializing cgroup subsys freezer
> [ 0.162992] Initializing cgroup subsys net_cls
> [ 0.162992] Initializing cgroup subsys net_cls
> [ 0.163913] Initializing cgroup subsys blkio
> [ 0.163913] Initializing cgroup subsys blkio
> [ 0.164843] Initializing cgroup subsys perf_event
> [ 0.164843] Initializing cgroup subsys perf_event
> [ 0.169308] ftrace: allocating 21118 entries in 8 pages
> [ 0.169308] ftrace: allocating 21118 entries in 8 pages
> [ 0.439808] POWER7 performance monitor hardware support registered
> [ 0.439808] POWER7 performance monitor hardware support registered
> [ 0.476013] Brought up 1 CPUs
> [ 0.476013] Brought up 1 CPUs
> [ 0.481103] Enabling Asymmetric SMT scheduling
> [ 0.481103] Enabling Asymmetric SMT scheduling
> [ 0.552049] devtmpfs: initialized
> [ 0.552049] devtmpfs: initialized
> [ 0.673170] atomic64 test passed
> [ 0.673170] atomic64 test passed
> [ 0.680501] NET: Registered protocol family 16
> [ 0.680501] NET: Registered protocol family 16
> [ 0.686950] IBM eBus Device Driver
> [ 0.686950] IBM eBus Device Driver
> [ 0.713306] nvram: No room to create ibm,rtas-log partition, deleting any
> obsolete OS partitions...
> [ 0.713306] nvram: No room to create ibm,rtas-log partition, deleting any
> obsolete OS partitions...
> [ 0.714363] nvram: Failed to find or create ibm,rtas-log partition, err -28
> [ 0.714363] nvram: Failed to find or create ibm,rtas-log partition, err -28
> [ 0.715042] nvram: No room to create lnx,oops-log partition, deleting any
> obsolete OS partitions...
> [ 0.715042] nvram: No room to create lnx,oops-log partition, deleting any
> obsolete OS partitions...
> [ 0.715559] nvram: Failed to find or create lnx,oops-log partition, err -28
> [ 0.715559] nvram: Failed to find or create lnx,oops-log partition, err -28
>
> Linux ppc64
> #1 SMP Mon May 1[ 0.720031] CPU Hotplug not supported by firmware -
> disabling.
> [ 0.720031] CPU Hotplug not supported by firmware - disabling.
> [ 0.740887] PCI: Probing PCI hardware
> [ 0.740887] PCI: Probing PCI hardware
> [ 0.749913] PCI host bridge to bus 0000:00
> [ 0.749913] PCI host bridge to bus 0000:00
> [ 0.751921] pci_bus 0000:00: root bus resource [io 0x10000-0x1ffff]
> [ 0.751921] pci_bus 0000:00: root bus resource [io 0x10000-0x1ffff]
> [ 0.752932] pci_bus 0000:00: root bus resource [mem
> 0x100a0000000-0x100bfffffff]
> [ 0.752932] pci_bus 0000:00: root bus resource [mem
> 0x100a0000000-0x100bfffffff]
> [ 0.765676] pci_dma_dev_setup_pSeriesLP: no DMA window found for pci
> dev=0000:00:00.0 dn=/address@hidden,0/address@hidden
> [ 0.765676] pci_dma_dev_setup_pSeriesLP: no DMA window found for pci
> dev=0000:00:00.0 dn=/address@hidden,0/address@hidden
> [ 0.773227] pci_dma_dev_setup_pSeriesLP: no DMA window found for pci
> dev=0000:00:01.0 dn=/address@hidden,0/address@hidden
> [ 0.773227] pci_dma_dev_setup_pSeriesLP: no DMA window found for pci
> dev=0000:00:01.0 dn=/address@hidden,0/address@hidden
> [ 0.787177] opal: Node not found
> [ 0.787177] opal: Node not found
> [ 0.831635] bio: create slab <bio-0> at 0
> [ 0.831635] bio: create slab <bio-0> at 0
> [ 0.854552] vgaarb: loaded
> [ 0.854552] vgaarb: loaded
> [ 0.861796] SCSI subsystem initialized
> [ 0.861796] SCSI subsystem initialized
> [ 0.873008] usbcore: registered new interface driver usbfs
> [ 0.873008] usbcore: registered new interface driver usbfs
> [ 0.874925] usbcore: registered new interface driver hub
> [ 0.874925] usbcore: registered new interface driver hub
> [ 0.877584] usbcore: registered new device driver usb
> [ 0.877584] usbcore: registered new device driver usb
> [ 0.915016] NetLabel: Initializing
> [ 0.915016] NetLabel: Initializing
> [ 0.915419] NetLabel: domain hash size = 128
> [ 0.915419] NetLabel: domain hash size = 128
> [ 0.915688] NetLabel: protocols = UNLABELED CIPSOv4
> [ 0.915688] NetLabel: protocols = UNLABELED CIPSOv4
> [ 0.921383] NetLabel: unlabeled traffic allowed by default
> [ 0.921383] NetLabel: unlabeled traffic allowed by default
> [ 0.923702] Switching to clocksource timebase
> [ 0.923702] Switching to clocksource timebase
> [ 1.354987] NET: Registered protocol family 2
> [ 1.354987] NET: Registered protocol family 2
> [ 1.366159] IP route cache hash table entries: 8192 (order: 0, 65536 bytes)
> [ 1.366159] IP route cache hash table entries: 8192 (order: 0, 65536 bytes)
> [ 1.385317] TCP established hash table entries: 16384 (order: 2, 262144
> bytes)
> [