|
From: | Quentin Hartman |
Subject: | Re: [Qemu-discuss] Disk Performance |
Date: | Fri, 30 May 2014 08:42:30 -0600 |
Hello all,
let me shortly tell you my own experiences with virtual disk performance.
In short: I stopped using image files of any kind. I switched to lvm
partitions. The reason is that your virtual hosts will choke if you copy files
to your physical host over the net or on the host itself. I tried everything
to prevent that but in the end I gave up. My host had 256 GB RAM and 32 Xeon
processors and it was impossible to copy a 10 GByte file to the host without
freezing the virtuals. I do find this pretty ridiculous, but I do not blame
qemu. The cause seems to be kernel-vfs related.
If you want working and performant qemu virtual hosts, use disk-based virtual
disks.
Regards,
Stephan
On Fri, 30 May 2014 13:13:13 +1000
Blair Bethwaite <address@hidden> wrote:
> Quentin,
>
> I doubt you'll get much useful help until you do a reasonable benchmark.
> The dd you mention in the original post is just testing your guest's page
> cache, i.e. it is not directio. I'd suggest grabbing a newish fio and using
> the genfio tool to help quickly setup some more varied benchmarks.
>
> Regarding your Ceph config, you might like to elaborate on the cluster
> setup and provide some detail as to the librbd options you're giving to
> Qemu, plus what Ceph baseline you are comparing against.
>
>
> On 30 May 2014 02:38, Quentin Hartman <address@hidden> wrote:
>
> > I don't know what I changed, but this morning instances running on local
> > storage are returning essentially bare-metal performance. The only
> > difference I see versus my test scenario is that they are actually using
> > qcow images instead of RAW. So, hurray! Assuming is stays performant, the
> > local storage problem is fixed. Now I just need to get ceph-backed
> > instances working right.
> >
> > Would this be the appropriate venue for that discussion, or would a ceph
> > list be a better venue? I believe the problem lies with qemu's interaction
> > with librbd since my direct tests of the ceph cluster indicate good
> > performance.
> >
> >
> > On Thu, May 29, 2014 at 10:14 AM, Quentin Hartman <address@hidden>
> > wrote:
> >
> >> I found this page: http://www.linux-kvm.org/page/Tuning_Kernel and all
> >> of the recommended kernel options are enabled or built as modules which are
> >> loaded.
> >>
> >>
> >> On Thu, May 29, 2014 at 10:07 AM, Quentin Hartman <address@hidden>
> >> wrote:
> >>
> >>> It looks like that particular feature is already enabled:
> >>>
> >>> address@hidden:~# dmesg | grep -e DMAR -e IOMMU
> >>> [ 0.000000] ACPI: DMAR 00000000bf77e0c0 000100 (v01 AMI OEMDMAR
> >>> 00000001 MSFT 00000097)
> >>> [ 0.105190] dmar: IOMMU 0: reg_base_addr fbffe000 ver 1:0 cap
> >>> c90780106f0462 ecap f020f6
> >>>
> >>>
> >>>
> >>> On Thu, May 29, 2014 at 10:04 AM, Quentin Hartman <address@hidden>
> >>> wrote:
> >>>
> >>>> I do not. I did not know those were a thing. My next steps were to
> >>>> experiment with different BIOS settings and kernel parameters, so this is a
> >>>> very timely suggestion. Thanks for the reply. I would love to hear other
> >>>> suggestions for kernel parameters that may be relevant.
> >>>>
> >>>> QH
> >>>>
> >>>>
> >>>> On Thu, May 29, 2014 at 9:58 AM, laurence.schuler <
> >>>> address@hidden> wrote:
> >>>>
> >>>>> On 05/28/2014 07:56 PM, Quentin Hartman wrote:
> >>>>>
> >>>>> Big picture, I'm working on getting an openstack deployment going
> >>>>> using ceph-backed volumes, but I'm running into really poor disk
> >>>>> performance, so I'm in the process of simplifying things to isolate exactly
> >>>>> where the problem lies.
> >>>>>
> >>>>> The machines I'm using are HP Proliant DL160 G6 machines with 72GB
> >>>>> of RAM. All the hardware virtualization features are turned on. Host OS is
> >>>>> Ubuntu 14.04, using deadline IO scheduler. I've run a variety of benchmarks
> >>>>> to make sure the disks are working right, and they seem to be. Everything
> >>>>> indicates bare metal write speeds to a single disk in the ~100MB/s
> >>>>> ballpark. Some tests report as high as 120MB/s.
> >>>>>
> >>>>> To try to isolate the problem I've done some testing with a very
> >>>>> simple [1] qemu invocation on one of the host machines. Inside that VM, I
> >>>>> get about 50MB/s write throughput. I've tested with both qemu 2.0 and 1.7
> >>>>> and gotten similar results. For quick testing I'm using a simple dd command
> >>>>> [2] to get a sense of where things lie. This has consistently produced
> >>>>> results near what more intensive synthetic benchmarks (iozone and dbench)
> >>>>> produced. I understand that I should be expecting closer to 80% of bare
> >>>>> metal performance. It seems that this would be the first place to focus, to
> >>>>> understand why things aren't going well.
> >>>>>
> >>>>> When running on a ceph-backed volume, I get closer to 15MB/s using
> >>>>> the same tests, and have as much as 50% iowait. Typical operations that
> >>>>> take seconds on bare metal take tens of seconds, or minutes in a VM. This
> >>>>> problem actually drove me to look at things with strace, and I'm finding
> >>>>> streams of FSYNC and PSELECT6 timeouts while the processes are running.
> >>>>> More direct tests of ceph performance are able to saturate the nic, pushing
> >>>>> about 90MB/s. I have ganglia installed on the host machines, and when I am
> >>>>> running tests from within a vm ,the network throughput seems to be getting
> >>>>> artificially capped. Rather than the more "spiky" graph produced by the
> >>>>> direct ceph tests, I get a perfectly flat horizontal line at 10 or 20MB/s.
> >>>>>
> >>>>> Any and all suggestions would be appreciated, especially if someone
> >>>>> has a similar deployment that I could compare notes with.
> >>>>>
> >>>>> QH
> >>>>>
> >>>>> 1 - My testing qemu invocation: qemu-system-x86_64 -cpu host -m 2G
> >>>>> -display vnc=0.0.0.0:1 -enable-kvm -vga std -rtc base=utc -drive
> >>>>> if=none,id=blk0,cache=none,aio=native,file=/root/cirros.raw -device
> >>>>> virtio-blk-pci,drive=blk0,id=blk0
> >>>>>
> >>>>> 2 - simple dd performance test: time dd if=/dev/zero of=deleteme.bin
> >>>>> bs=20M count=256
> >>>>>
> >>>>> Hi Quentin,
> >>>>> Do you have the passthrough options on the host kernel command line?
> >>>>> I think it's intel_iommu=on
> >>>>>
> >>>>> --larry
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>
> --
> Cheers,
> ~Blairo
[Prev in Thread] | Current Thread | [Next in Thread] |