|
From: | Kechen Lu |
Subject: | Optimized clocksource with AMD AVIC enabled for Windows guest |
Date: | Wed, 3 Feb 2021 05:40:24 +0000 |
Hi KVM & AMD folks, We are trying to enable AVIC on Windows guest and AMD host machine, on upstream kernel 5.8+. From our experiments and vmexit metrics, we can see AVIC brings us huge benefits over decreased by >80% interrupt vmexit, and totally avoid vintr
and write_cr8 vmexits. But it seems for Windows guest, we have to give up the Hyper-v PV feature on the stimer (hv-stimer feature). So in order to get the best of both the worlds, do we have a more optimized clocksource for Windows guest which could co-exist
with AVIC enabled (as now stimer cannot cowork AVIC) ? Some detailed performance analysis below - From the kvm kernel func kvm_hv_activate_synic in
https://elixir.bootlin.com/linux/v5.8/source/arch/x86/kvm/hyperv.c#L891, SynIC enabling would prevent apicv (for AMD it’s AVIC), whereas SynIC is the pre-requisite of stimer. From the actual experiments, without hyper-v stimer, there are a lot of port IO
vmexits which potential bring perf down cpu-bound workloads, like geekbench, around 10% of single core performance regressing. As the vmexits result when we enable AVIC but having the hypervclock and rtc as clocksource, without stimer+synic. ------------------------------------------------------------------------------------------------------------ Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time io 575088 43.42% 1.96% 0.68us 100.62us 7.47us ( +- 0.13% ) msr 434530 32.81% 0.29% 0.41us 350.50us 1.45us ( +- 0.30% ) hlt 308635 23.30% 97.75% 0.43us 3791.74us 693.91us ( +- 0.12% ) interrupt 4796 0.36% 0.00% 0.33us 1606.17us 1.89us ( +- 18.69% ) write_cr4 752 0.06% 0.00% 0.53us 34.80us 1.42us ( +- 3.97% ) read_cr4 376 0.03% 0.00% 0.40us 1.32us 0.62us ( +- 1.22% ) npf 85 0.01% 0.00% 1.68us 57.95us 8.33us ( +- 12.54% ) pause 71 0.01% 0.00% 0.36us 1.44us 0.62us ( +- 3.45% ) cpuid 50 0.00% 0.00% 0.33us 1.11us 0.45us ( +- 5.94% ) hypercall 10 0.00% 0.00% 0.81us 1.42us 1.12us ( +- 5.87% ) nmi 1 0.00% 0.00% 0.67us 0.67us 0.67us ( +- 0.00% ) Total Samples:1324394, Total events handled time:219105470.74us. ----------------------------------------------------------------------------------------------------------- It shows dramatically high IO vmexits, and we can further see which IO ports Windows guest accessed. ----------------------------------------------------- Analyze events for all VMs, all VCPUs: IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x70:POUT 287544 50.00% 13.10% 0.40us 23.48us 0.53us ( +- 0.06% ) 0x71:PIN 226154 39.33% 7.60% 0.31us 22.91us 0.39us ( +- 0.08% ) 0x71:POUT 61390 10.67% 79.31% 12.92us 69.99us 14.95us ( +- 0.09% ) Total Samples:575088, Total events handled time:1156983.53us. --------------------------------------------- However 0070-0071 are rtc0 port, which means there are horrible guest RTC access overhead. With stimer + synic on and AVIC disabled, the vmexit metrics look much better over IO and MSR, as below. Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time hlt 166815 38.30% 99.66% 0.44us 1556.67us 809.48us ( +- 0.11% ) interrupt 146218 33.57% 0.13% 0.30us 1362.10us 1.19us ( +- 1.50% ) msr 105267 24.17% 0.20% 0.37us 87.47us 2.51us ( +- 0.31% ) vintr 9285 2.13% 0.01% 0.50us 1.92us 0.78us ( +- 0.16% ) write_cr8 7537 1.73% 0.00% 0.31us 49.14us 0.66us ( +- 1.08% ) cpuid 174 0.04% 0.00% 0.31us 1.39us 0.46us ( +- 3.21% ) npf 143 0.03% 0.00% 1.49us 237.66us 21.04us ( +- 12.04% ) write_cr4 32 0.01% 0.00% 0.93us 5.78us 2.10us ( +- 11.38% ) pause 22 0.01% 0.00% 0.45us 1.33us 0.84us ( +- 5.46% ) read_cr4 16 0.00% 0.00% 0.47us 0.68us 0.60us ( +- 2.19% ) nmi 11 0.00% 0.00% 0.35us 0.70us 0.54us ( +- 5.06% ) write_dr7 2 0.00% 0.00% 0.43us 0.45us 0.44us ( +- 2.27% ) hypercall 1 0.00% 0.00% 0.97us 0.97us 0.97us ( +- 0.00% ) Total Samples:435523, Total events handled time:135488497.29us. --------------------------------- From the above observations, trying to see if there’s a way for enabling AVIC while also having the most optimized clock source for windows guest. Really appreciated and looking forward to your response. Best Regards, Kechen |
[Prev in Thread] | Current Thread | [Next in Thread] |