[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Optimized clocksource with AMD AVIC enabled for Windows guest
From: |
Kechen Lu |
Subject: |
Optimized clocksource with AMD AVIC enabled for Windows guest |
Date: |
Wed, 3 Feb 2021 06:40:15 +0000 |
[resent for the previous non-plain text format]
Hi KVM & AMD folks,
We are trying to enable AVIC on Windows guest and AMD host machine, on upstream
kernel 5.8+. From our experiments and vmexit metrics, we can see AVIC brings us
huge benefits over decreased by >80% interrupt vmexit, and totally avoid vintr
and write_cr8 vmexits. But it seems for Windows guest, we have to give up the
Hyper-v PV feature on the stimer (hv-stimer feature). So in order to get the
best of both the worlds, do we have a more optimized clocksource for Windows
guest which could co-exist with AVIC enabled (as now stimer cannot cowork AVIC)
?
Some detailed performance analysis below -
>From the kvm kernel func kvm_hv_activate_synic in
>https://elixir.bootlin.com/linux/v5.8/source/arch/x86/kvm/hyperv.c#L891, SynIC
>enabling would prevent apicv (for AMD it's AVIC), whereas SynIC is the
>pre-requisite of stimer. >From the actual experiments, without hyper-v stimer,
>there are a lot of port IO vmexits which potential bring perf down cpu-bound
>workloads, like geekbench, around 10% of single core performance regressing.
>As the vmexits result when we enable AVIC but having the hypervclock and rtc
>as clocksource, without stimer+synic.
------------------------------------------------------------------------------------------------------------
Analyze events for all VMs, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time
Avg time
io 575088 43.42% 1.96% 0.68us 100.62us
7.47us ( +- 0.13% )
msr 434530 32.81% 0.29% 0.41us 350.50us
1.45us ( +- 0.30% )
hlt 308635 23.30% 97.75% 0.43us 3791.74us
693.91us ( +- 0.12% )
interrupt 4796 0.36% 0.00% 0.33us 1606.17us
1.89us ( +- 18.69% )
write_cr4 752 0.06% 0.00% 0.53us 34.80us
1.42us ( +- 3.97% )
read_cr4 376 0.03% 0.00% 0.40us 1.32us
0.62us ( +- 1.22% )
npf 85 0.01% 0.00% 1.68us 57.95us
8.33us ( +- 12.54% )
pause 71 0.01% 0.00% 0.36us 1.44us
0.62us ( +- 3.45% )
cpuid 50 0.00% 0.00% 0.33us 1.11us
0.45us ( +- 5.94% )
hypercall 10 0.00% 0.00% 0.81us 1.42us
1.12us ( +- 5.87% )
nmi 1 0.00% 0.00% 0.67us 0.67us
0.67us ( +- 0.00% )
Total Samples:1324394, Total events handled time:219105470.74us.
-----------------------------------------------------------------------------------------------------------
It shows dramatically high IO vmexits, and we can further see which IO ports
Windows guest accessed.
-----------------------------------------------------
Analyze events for all VMs, all VCPUs:
IO Port Access Samples Samples% Time% Min Time Max Time
Avg time
0x70:POUT 287544 50.00% 13.10% 0.40us 23.48us
0.53us ( +- 0.06% )
0x71:PIN 226154 39.33% 7.60% 0.31us 22.91us
0.39us ( +- 0.08% )
0x71:POUT 61390 10.67% 79.31% 12.92us 69.99us
14.95us ( +- 0.09% )
Total Samples:575088, Total events handled time:1156983.53us.
---------------------------------------------
However 0070-0071 are rtc0 port, which means there are horrible guest RTC
access overhead. With stimer + synic on and AVIC disabled, the vmexit metrics
look much better over IO and MSR, as below.
-----------------------------------------
Analyze events for all VMs, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time
Avg time
hlt 166815 38.30% 99.66% 0.44us 1556.67us
809.48us ( +- 0.11% )
interrupt 146218 33.57% 0.13% 0.30us 1362.10us
1.19us ( +- 1.50% )
msr 105267 24.17% 0.20% 0.37us 87.47us
2.51us ( +- 0.31% )
vintr 9285 2.13% 0.01% 0.50us 1.92us
0.78us ( +- 0.16% )
write_cr8 7537 1.73% 0.00% 0.31us 49.14us
0.66us ( +- 1.08% )
cpuid 174 0.04% 0.00% 0.31us 1.39us
0.46us ( +- 3.21% )
npf 143 0.03% 0.00% 1.49us 237.66us
21.04us ( +- 12.04% )
write_cr4 32 0.01% 0.00% 0.93us 5.78us
2.10us ( +- 11.38% )
pause 22 0.01% 0.00% 0.45us 1.33us
0.84us ( +- 5.46% )
read_cr4 16 0.00% 0.00% 0.47us 0.68us
0.60us ( +- 2.19% )
nmi 11 0.00% 0.00% 0.35us 0.70us
0.54us ( +- 5.06% )
write_dr7 2 0.00% 0.00% 0.43us 0.45us
0.44us ( +- 2.27% )
hypercall 1 0.00% 0.00% 0.97us 0.97us
0.97us ( +- 0.00% )
Total Samples:435523, Total events handled time:135488497.29us.
---------------------------------
>From the above observations, trying to see if there's a way for enabling AVIC
>while also having the most optimized clock source for windows guest.
Really appreciated and looking forward to your response.
Best Regards,
Kechen
- Optimized clocksource with AMD AVIC enabled for Windows guest, Kechen Lu, 2021/02/03
- Optimized clocksource with AMD AVIC enabled for Windows guest,
Kechen Lu <=
- Re: Optimized clocksource with AMD AVIC enabled for Windows guest, Paolo Bonzini, 2021/02/03
- Re: Optimized clocksource with AMD AVIC enabled for Windows guest, Vitaly Kuznetsov, 2021/02/03
- RE: Optimized clocksource with AMD AVIC enabled for Windows guest, Kechen Lu, 2021/02/03
- RE: Optimized clocksource with AMD AVIC enabled for Windows guest, Vitaly Kuznetsov, 2021/02/04
- Re: Optimized clocksource with AMD AVIC enabled for Windows guest, Paolo Bonzini, 2021/02/04
- Re: Optimized clocksource with AMD AVIC enabled for Windows guest, Vitaly Kuznetsov, 2021/02/04
- Re: Optimized clocksource with AMD AVIC enabled for Windows guest, Vitaly Kuznetsov, 2021/02/04
- RE: Optimized clocksource with AMD AVIC enabled for Windows guest, Kechen Lu, 2021/02/05
- RE: Optimized clocksource with AMD AVIC enabled for Windows guest, Kechen Lu, 2021/02/17
- RE: Optimized clocksource with AMD AVIC enabled for Windows guest, Vitaly Kuznetsov, 2021/02/25