[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Optimized clocksource with AMD AVIC enabled for Windows guest
From: |
Kechen Lu |
Subject: |
RE: Optimized clocksource with AMD AVIC enabled for Windows guest |
Date: |
Wed, 17 Feb 2021 20:41:58 +0000 |
Hi Vitaly and Paolo,
Sorry for the delay in response, finally got chance to access a machine with
AVIC, and was able to test out the patch and reconfirm through some benchmarks
and tests again today:)
In summary, this patch works well and resolves the issues on clocksource caused
high port I/O vmexits. With AVIC=1 && stimer/synic=1,
1. CPU intensive workload CPU-z shows SingleThread score 15% improvement
382.1=> 441.7,
2. disk I/O intensive workload Passmark Disk Test gives 4% improvement
12706=> 13265,
3. Vmexits pattern of 30s record while running cpu workload Geekbench in
guest showing dramatic 90.7% decrease on port IO vmexits, so as the HLT and NPF
vmexits, when we get stimer benefit plus AVIC. Details as below:
AVIC=1 && stimer/synic=0 && vapic=0:
VM-EXIT Samples Samples% Time% Min Time Max Time
Avg time
io 344654 68.29% 1.10% 0.67us 2132.72us
7.01us ( +- 0.19% )
hlt 114046 22.60% 98.85% 0.42us 16666.32us
1903.26us ( +- 0.66% )
avic_incomplete_ipi 19679 3.90% 0.03% 0.38us 22.67us
3.66us ( +- 0.71% )
npf 8186 1.62% 0.01% 0.37us 235.76us
1.46us ( +- 4.20% )
........
AVIC=1 && stimer/synic=1 && vapic=0:
VM-EXIT Samples Samples% Time% Min Time Max Time
Avg time
io 31995 38.61% 0.10% 2.79us 65.83us
6.70us ( +- 0.35% )
hlt 22915 27.65% 99.88% 0.42us 15959.14us
9535.38us ( +- 0.50% )
avic_incomplete_ipi 8271 9.98% 0.01% 0.39us 79.03us
3.58us ( +- 1.23% )
npf 1232 1.49% 0.00% 0.36us 100.25us
2.58us ( +- 6.98% )
..........
While testing, I also found out hv-vapic should be disabled as well to make
AVIC fully functional, otherwise it shows high vmexits due to MSR writes which
seems to be due to increased access to HV_X64_MSR_EOI and HV_X64_MSR_ICR. This
makes sense to me, since AVIC conflicts with PV EOI/ICR accesses. So far I
think AVIC=1 && hv-vapic=0 && stimer/synic=1 combination gives us the best
performance. However, AVIC=1 && hv-vapic=0 && stimer/synic=1 is really
unstable, and sometimes would lead to boot. Wanted to understand if
instabilities with APICv/AVIC is a known bug/issue in upstream? Attached the
reproducible kernel warning in the bottom.
In all, AVIC=1 && hv-vapic=1 && stimer/synic=1 could work stably now and still
produce great benefits on vmexits optimization. Thanks all you folks help so
much, hope the patch in kernel and bit expose patch in QEMU could get into
upstream soon along with fixing the instabilities.
Best Regards,
Kechen
---------------------------------------------------------------------------------------
[ 7962.437584] ------------[ cut here ]------------
[ 7962.437586] Invalid IPI target: index=2, vcpu=0, icr=0x4000000:0x82f
[ 7962.437603] WARNING: CPU: 4 PID: 7109 at arch/x86/kvm/svm/avic.c:349
avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
[ 7962.437604] Modules linked in: kvm_amd ccp kvm msr nf_tables nfnetlink
bridge stp llc amd64_edac_mod edac_mce_amd nls_iso8859_1 amd_energy
crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper
snd_hda_codec_hdmi rapl snd_hda_intel snd_intel_dspcfg wmi_bmof snd_hda_codec
snd_usb_audio snd_hda_core snd_usbmidi_lib snd_hwdep snd_seq_midi
snd_seq_midi_event snd_rawmidi efi_pstore joydev mc input_leds snd_seq snd_pcm
snd_seq_device snd_timer snd soundcore k10temp mac_hid sch_fq_codel lm92
parport_pc ppdev lp parport ip_tables x_tables autofs4 iavf hid_generic usbhid
hid nvme crc32_pclmul i40e ahci nvme_core xhci_pci libahci xhci_pci_renesas
i2c_piix4 atlantic macsec wmi [last unloaded: ccp]
[ 7962.437630] CPU: 4 PID: 7109 Comm: CPU 0/KVM Tainted: P W OE
5.8.0-41-generic #46
[ 7962.437633] RIP: 0010:avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
[ 7962.437635] Code: 9a 00 00 00 0f 85 2b ff ff ff 41 8b 56 24 8b 4d c8 45 89
e0 44 89 ee 48 c7 c7 a8 34 50 c0 c6 05 b2 9a 00 00 01 e8 d6 cc 3a fb <0f> 0b e9
04 ff ff ff 48 8b 5d c0 8b 55 c8 be 10 03 00 00 48 89 df
[ 7962.437636] RSP: 0018:ffffa7894f9bfcc0 EFLAGS: 00010282
[ 7962.437637] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff99347f118cd8
[ 7962.437637] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff99347f118cd0
[ 7962.437638] RBP: ffffa7894f9bfd18 R08: 0000000000000004 R09: 0000000000000831
[ 7962.437638] R10: 0000000000000000 R11: 0000000000000001 R12: 040000000000082f
[ 7962.437639] R13: 0000000000000002 R14: ffff993345653448 R15: 0000000000000002
[ 7962.437640] FS: 0000000000000000(0053) GS:ffff99347f100000(002b)
knlGS:fffff80470728000
[ 7962.437640] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7962.437641] CR2: ffff8006ace2b000 CR3: 0000000febd88000 CR4: 0000000000340ee0
[ 7962.437641] Call Trace:
[ 7962.437646] handle_exit+0x134/0x420 [kvm_amd]
[ 7962.437661] ? kvm_set_cr8+0x22/0x40 [kvm]
[ 7962.437674] vcpu_enter_guest+0x862/0xd90 [kvm]
[ 7962.437687] vcpu_run+0x76/0x240 [kvm]
[ 7962.437699] kvm_arch_vcpu_ioctl_run+0x9f/0x2b0 [kvm]
[ 7962.437711] kvm_vcpu_ioctl+0x247/0x600 [kvm]
[ 7962.437714] ksys_ioctl+0x8e/0xc0
[ 7962.437715] __x64_sys_ioctl+0x1a/0x20
[ 7962.437717] do_syscall_64+0x49/0xc0
[ 7962.437719] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7962.437720] RIP: 0033:0x7f4c09b1131b
[ 7962.437721] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff
85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 1d 3b 0d 00 f7 d8 64 89 01 48
[ 7962.437721] RSP: 002b:00007f4bedffa4a8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 7962.437722] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4c09b1131b
[ 7962.437723] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
[ 7962.437723] RBP: 0000563c35a94990 R08: 0000563c33b95a30 R09: 0000000000000004
[ 7962.437724] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 7962.437724] R13: 0000563c34196d00 R14: 0000000000000000 R15: 00007f4bedffb640
[ 7962.437726] ---[ end trace 7f0f339c3a001d7b ]---
- Optimized clocksource with AMD AVIC enabled for Windows guest, Kechen Lu, 2021/02/03
- Optimized clocksource with AMD AVIC enabled for Windows guest, Kechen Lu, 2021/02/03
- Re: Optimized clocksource with AMD AVIC enabled for Windows guest, Paolo Bonzini, 2021/02/03
- Re: Optimized clocksource with AMD AVIC enabled for Windows guest, Vitaly Kuznetsov, 2021/02/03
- RE: Optimized clocksource with AMD AVIC enabled for Windows guest, Kechen Lu, 2021/02/03
- RE: Optimized clocksource with AMD AVIC enabled for Windows guest, Vitaly Kuznetsov, 2021/02/04
- Re: Optimized clocksource with AMD AVIC enabled for Windows guest, Paolo Bonzini, 2021/02/04
- Re: Optimized clocksource with AMD AVIC enabled for Windows guest, Vitaly Kuznetsov, 2021/02/04
- Re: Optimized clocksource with AMD AVIC enabled for Windows guest, Vitaly Kuznetsov, 2021/02/04
- RE: Optimized clocksource with AMD AVIC enabled for Windows guest, Kechen Lu, 2021/02/05
- RE: Optimized clocksource with AMD AVIC enabled for Windows guest,
Kechen Lu <=
- RE: Optimized clocksource with AMD AVIC enabled for Windows guest, Vitaly Kuznetsov, 2021/02/25