[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] vm performance degradation after kvm live migration or
From: |
Zhanghaoyu (A) |
Subject: |
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled |
Date: |
Mon, 5 Aug 2013 09:09:56 +0000 |
>> >> >> >> hi all,
>> >> >> >>
>> >> >> >> I met similar problem to these, while performing live migration or
>> >> >> >> save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2,
>> >> >> >> guest:suse11sp2), running tele-communication software suite in
>> >> >> >> guest,
>> >> >> >> https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
>> >> >> >> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
>> >> >> >> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
>> >> >> >> https://bugzilla.kernel.org/show_bug.cgi?id=58771
>> >> >> >>
>> >> >> >> After live migration or virsh restore [savefile], one process's CPU
>> >> >> >> utilization went up by about 30%, resulted in throughput
>> >> >> >> degradation of this process.
>> >> >> >>
>> >> >> >> If EPT disabled, this problem gone.
>> >> >> >>
>> >> >> >> I suspect that kvm hypervisor has business with this problem.
>> >> >> >> Based on above suspect, I want to find the two adjacent versions of
>> >> >> >> kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1),
>> >> >> >> and analyze the differences between this two versions, or apply the
>> >> >> >> patches between this two versions by bisection method, finally find
>> >> >> >> the key patches.
>> >> >> >>
>> >> >> >> Any better ideas?
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Zhang Haoyu
>> >> >> >
>> >> >> >I've attempted to duplicate this on a number of machines that are as
>> >> >> >similar to yours as I am able to get my hands on, and so far have not
>> >> >> >been able to see any performance degradation. And from what I've read
>> >> >> >in the above links, huge pages do not seem to be part of the problem.
>> >> >> >
>> >> >> >So, if you are in a position to bisect the kernel changes, that would
>> >> >> >probably be the best avenue to pursue in my opinion.
>> >> >> >
>> >> >> >Bruce
>> >> >>
>> >> >> I found the first bad
>> >> >> commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault
>> >> >> r/w information to gup(), allow read-only memory) which triggers this
>> >> >> problem by git bisecting the kvm kernel (download from
>> >> >> https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
>> >> >>
>> >> >> And,
>> >> >> git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p >
>> >> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
>> >> >> git diff
>> >> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
>> >> >> 02f13b1b63f7e4 > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
>> >> >>
>> >> >> Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and
>> >> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
>> >> >> came to a conclusion that all of the differences between
>> >> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and
>> >> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
>> >> >> are contributed by no other than
>> >> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the
>> >> >> peace-breaker which directly or indirectly causes the degradation.
>> >> >>
>> >> >> Does the map_writable flag passed to mmu_set_spte() function have
>> >> >> effect on PTE's PAT flag or increase the VMEXITs induced by that guest
>> >> >> tried to write read-only memory?
>> >> >>
>> >> >> Thanks,
>> >> >> Zhang Haoyu
>> >> >>
>> >> >
>> >> >There should be no read-only memory maps backing guest RAM.
>> >> >
>> >> >Can you confirm map_writable = false is being passed to __direct_map?
>> >> >(this should not happen, for guest RAM).
>> >> >And if it is false, please capture the associated GFN.
>> >> >
>> >> I added below check and printk at the start of __direct_map() at the fist
>> >> bad commit version,
>> >> --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c
>> >> 2013-07-26 18:44:05.000000000 +0800
>> >> +++ kvm-612819/arch/x86/kvm/mmu.c 2013-07-31 00:05:48.000000000
>> >> +0800
>> >> @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
>> >> int pt_write = 0;
>> >> gfn_t pseudo_gfn;
>> >>
>> >> + if (!map_writable)
>> >> + printk(KERN_ERR "%s: %s: gfn = %llu \n", __FILE__,
>> >> __func__, gfn);
>> >> +
>> >> for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
>> >> if (iterator.level == level) {
>> >> unsigned pte_access = ACC_ALL;
>> >>
>> >> I virsh-save the VM, and then virsh-restore it, so many GFNs were
>> >> printed, you can absolutely describe it as flooding.
>> >>
>> >The flooding you see happens during migrate to file stage because of dirty
>> >page tracking. If you clear dmesg after virsh-save you should not see any
>> >flooding after virsh-restore. I just checked with latest tree, I do not.
>>
>> I made a verification again.
>> I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed,
>> maybe the switching from running stage to pause stage takes so short time,
>> no guest-write happens during this switching period.
>> After the completion of saving operation, I run 'demsg -c' to clear the
>> buffer all the same, then I virsh-restore the VM, so many GFNs are printed
>> by running 'dmesg',
>> and I also run 'tail -f /var/log/messages' during the restoring stage, so
>> many GFNs are flooded dynamically too.
>> I'm sure that the flooding happens during the virsh-restore stage, not the
>> migration stage.
>>
>Interesting, is this with upstream kernel? For me the situation is
>exactly the opposite. What is your command line?
>
I made the verification on the first bad commit
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, not the upstream.
When I build the upstream, encounter a problem that I compile and install the
upstream(commit: e769ece3b129698d2b09811a6f6d304e4eaa8c29) on sles11sp2
environment via below command
cp /boot/config-3.0.13-0.27-default ./.config
yes "" | make oldconfig
make && make modules_install && make install
then, I reboot the host, and select the upstream kernel, but during the
starting stage, below problem happened,
Could not find /dev/disk/by-id/scsi-3600508e000000000864407c5b8f7ad01-part3
I'm trying to resolve it.
The QEMU command line (/var/log/libvirt/qemu/[domain name].log),
LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none
/usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu qemu32
-enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid
0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.0,addr=0x3,bootindex=2
-netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.0,addr=0x4
-netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device
virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.0,addr=0x5
-netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device
virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.0,addr=0x6
-netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device
virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.0,addr=0x7
-netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device
virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.0,addr=0x9
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
-vnc *:0 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb
-watchdog-action poweroff -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
Thanks,
Zhang Haoyu
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Gleb Natapov, 2013/08/01
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled, Zhanghaoyu (A), 2013/08/05
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled, Gleb Natapov, 2013/08/05
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled, Gleb Natapov, 2013/08/05
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled, Zhanghaoyu (A), 2013/08/06
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled, Zhanghaoyu (A), 2013/08/06
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled, Gleb Natapov, 2013/08/07
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled, Zhanghaoyu (A), 2013/08/14
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled, Zhanghaoyu (A), 2013/08/20
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled, Zhanghaoyu (A), 2013/08/31
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled, Xiao Guangrong, 2013/08/05