Re: [PATCH v4 16/21] i386: track explicit 'hv-*' features enablement/dis

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 16/21] i386: track explicit 'hv-*' features enablement/dis

From:	Igor Mammedov
Subject:	Re: [PATCH v4 16/21] i386: track explicit 'hv-*' features enablement/disablement
Date:	Mon, 1 Mar 2021 16:32:12 +0100
On Wed, 24 Feb 2021 18:00:43 +0100
Vitaly Kuznetsov <vkuznets@redhat.com> wrote:

> Igor Mammedov <imammedo@redhat.com> writes:
> 
> > On Tue, 23 Feb 2021 19:08:42 +0100
> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >  
> >> Igor Mammedov <imammedo@redhat.com> writes:
> >>   
> >> > On Tue, 23 Feb 2021 16:46:50 +0100
> >> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >> >    
> >> >> Igor Mammedov <imammedo@redhat.com> writes:
> >> >>     
> >> >> > On Mon, 22 Feb 2021 11:20:34 +0100
> >> >> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >> >> >      
> >> >> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> >> >> >>       
> >> >> >> > Igor Mammedov <imammedo@redhat.com> writes:
> >> >> >> >        
> >> >> >> >>> 
> >> >> >> >>> We need to distinguish because that would be sane.
> >> >> >> >>> 
> >> >> >> >>> Enlightened VMCS is an extension to VMX, it can't be used without
> >> >> >> >>> it. Genuine Hyper-V doesn't have a knob for enabling and 
> >> >> >> >>> disabling it,        
> >> >> >> >> ...        
> >> >> >> >>> That bein said, if
> >> >> >> >>> guest CPU lacks VMX it is counter-productive to expose EVMCS. 
> >> >> >> >>> However,
> >> >> >> >>> there is a problem with explicit enablement: what should
> >> >> >> >>> 
> >> >> >> >>> 'hv-passthrough,hv-evmcs' option do? Just silently drop EVMCS? 
> >> >> >> >>> Doesn't
> >> >> >> >>> sound sane to me.        
> >> >> >> >> based on above I'd error out is user asks for unsupported option
> >> >> >> >> i.e. no VMX -> no hv-evmcs - if explicitly asked -> error out     
> >> >> >> >>    
> >> >> >> >
> >> >> >> > That's what I keep telling you but you don't seem to listen. 
> >> >> >> > 'Scratch
> >> >> >> > CPU' can't possibly help with this use-case because when you parse 
> >> >> >> >
> >> >> >> > 'hv-passthrough,hv-evmcs,vmx=off' you
> >> >> >> >
> >> >> >> > 1) "hv-passthrough" -> set EVMCS bit to '1' as it is supported by 
> >> >> >> > the
> >> >> >> > host.
> >> >> >> >
> >> >> >> > 2) 'hv-evmcs' -> keep EVMCS bit '1'
> >> >> >> >
> >> >> >> > 3) 'vmx=off' -> you have no idea where EVMCS bit came from.
> >> >> >> >
> >> >> >> > We have to remember which options were aquired from the host and 
> >> >> >> > which
> >> >> >> > were set explicitly by the user.         
> >> >> >> 
> >> >> >> Igor,
> >> >> >> 
> >> >> >> could you please comment on the above? In case my line of thought is
> >> >> >> correct, and it is impossible to distinguish between e.g.
> >> >> >> 
> >> >> >> 'hv-passthrough,hv-evmcs,-vmx'
> >> >> >> and
> >> >> >> 'hv-passthrough,-vmx'
> >> >> >> 
> >> >> >> without a custom parser (written just exactly the way I did in this
> >> >> >> version, for example) regardless of when 'hv-passthrough' is
> >> >> >> expanded. E.g. we have the exact same problem with
> >> >> >> 'hv-default,hv-evmcs,-vmx'. I that case I see no point in discussing 
> >> >> >>      
> >> >> >
> >> >> > right, if we need to distinguish between explicit and implicit 
> >> >> > hv-evmcs set by
> >> >> > hv-passthrough custom parser probably the way to go.
> >> >> >
> >> >> > However do we need actually need to do it?      
> >> >> 
> >> >> I think we really need that. See below ...
> >> >>     
> >> >> > I'd treat 'hv-passthrough,-vmx' the same way as 
> >> >> > 'hv-passthrough,hv-evmcs,-vmx'
> >> >> > and it applies not only hv-evmcs but other features hv-passthrough 
> >> >> > might set
> >> >> > (i.e. if whatever was [un]set by hv-passthrough in combination with 
> >> >> > other
> >> >> > features results in invalid config, QEMU shall error out instead of 
> >> >> > magically
> >> >> > altering host provided hv-passthrough value).
> >> >> >
> >> >> > something like:
> >> >> >   'hv-passthrough,-vmx' when hv-passthrough makes hv-evmcs bit set
> >> >> > should result in
> >> >> >   error_setg(errp,"'vmx' feature can't be disabled when hv-evmcs is 
> >> >> > enabled,"
> >> >> >                  " either enable 'vmx' or disable 'hv-evmcs' along 
> >> >> > with disabling 'vmx'"
> >> >> >
> >> >> > making host's features set, *magically* mutable, depending on other 
> >> >> > user provided features
> >> >> > is a bit confusing. One would never know what hv-passthrough actually 
> >> >> > means, and if
> >> >> > enabling/disabling 'random' feature changes it.
> >> >> >
> >> >> > It's cleaner to do just what user asked (whether implicitly or 
> >> >> > explicitly) and error out
> >> >> > in case it ends up in nonsense configuration.
> >> >> >      
> >> >> 
> >> >> I don't seem to agree this is a sane behavior, especially if you replace
> >> >> 'hv-passthrough' with 'hv-default' above. Removing 'vmx' from CPU for
> >> >> Windows guests is common if you'd want to avoid nested configuration:
> >> >> even without any Hyper-V guests created, Windows itself is a Hyper-V
> >> >> partition.
> >> >> 
> >> >> So a sane user will do:
> >> >> 
> >> >> '-cpu host,hv-default,vmx=off' 
> >> >> 
> >> >> and on Intel he will get an error, and on AMD he won't. 
> >> >> 
> >> >> So what you're suggesting actually defeats the whole purpose of
> >> >> 'hv-default' as upper-layer tools (think libvirt) will need to know 
> >> >> that    
> >> > I'd assume it would be hard for libvirt to use 'hv-default' from 
> >> > migration
> >> > point of view. It's semi opaque (one can find out what features it sets
> >> > indirectly inspecting individual hv_foo features, and mgmt will need to
> >> > know about them). If it will mutate when other features [un]set, upper
> >> > layers might need to enumerate all these permutations to know which hosts
> >> > are compatible or compare host feature sets every time before attempting
> >> > migration.  
> >> 
> >> That's exactly the opposite of what's the goal here which is: make it
> >> possible for upper layers to not know anything about Hyper-V
> >> enlightenments besides 'hv-default'. Migration should work just fine, if
> >> the rest of guest configuration matches -- then 'hv-default' will create
> >> the exact same things (e.g. if 'vmx' was disabled on the source it has  
> >             ^^^^^
> > I'm not convinced in that yet (not with current impl. more on that at the 
> > end of reply)
> >  
> >> to be enabled on the destination, it can't be different)
> >> 
> >>   
> >> >> Intel configurations for Windows guests are somewhat different. They'll
> >> >> need to know what 'hv-evmcs' is. We're back to where we've started.    
> >> >
> >> > we were talking about hv-passthrough, and if host advertises hv-evmcs
> >> > QEMU should complain if user disabled features it depends on (
> >> > not silently fixing up configuration error).
> >> > But the same applies to hv-default.    
> >> 
> >> Let's forget about hv-passthrough completely for a while as this series
> >> is kind of unrelated to it.  
> >
> > It adds a lot for unrelated code (not just couple of lines),
> > I've played with scratch CPU idea, here is demo of it
> > https://github.com/imammedo/qemu/commit/a4b107d5368ebf72d45082bc8310a6b88a4ba6fb
> > I didn't rework caps/cpuid querying parts (just hacked around it),
> > and even without that it saves us ~200LOC (not a small part of which comes
> > with this series).  
> 
> All your savings come from throwing away custom parsers -- which are not
> needed at all if we don't distinguish between
> 
> 'hv-default,hv-evmcs' and 'hv-default'
> 
> it's just not needed, don't count these patches in. Or, if it is needed,
> please explain how your scratch CPU is making things different. I guess
> it is not so we can discuss this outside of this series.

scratch CPU helps with hostpassthrough refactoring which also brings in
dependency on custom parser.


> > I also split horrible hv_cpuid_check_and_set into separate 'set' and 
> > 'check' stages.
> > Granted it was sort-of pre-existing ugly code, some of your
> > re-factoring made it a bit better but it's still far from readable.
> >    
> 
> hv_cpuid_check_and_set() is already there, I'm not at all opposed to
> making this code even better but I don't see it as a must for this
> particular feature (hv-default).
it's not a must have for hv-default especially if you don't touch it.
(however if you touch it & co, I'd ask to clean it up first)


> >> In the previous submission I was setting 'hv-default' based on host
> >> availability of the feature only. That is: set on Intel, unset on
> >> AMD. We have to at least preserve that because it would be insane to
> >> crash on
> >> 
> >> -cpu host,hv-default 
> >> 
> >> on AMD because AMD doesn't (and never will!) support hv-evmcs, right?  
> >
> > If QEMU prevents cross arch migration i.e. it's not supported,
> > then I guess we can make hv-default different depending on AMD or Intel 
> > host.
> > If not then we might need to be conservative i.e. exclude hv-evmcs from 
> > defaults.
> >  
> 
> Forget about cross vendor. I want to tie it 1:1 to VMX feature on the
> guest CPU -- which happens to be only available on Intel. It is
> absolutely impossible to migrate VMX enabled guest to VMX-disabled
> destination, with or without evmcs.
Ok

> >> >> If we are to follow this approach let's just throw away 'hv-evmcs' from
> >> >> 'hv-default' set, it's going to be much cleaner. But again, I don't
> >> >> really believe it's the right way to go.    
> >> >
> >> > if desired behavior, on Intel host for above config, to start without 
> >> > error
> >> > then indeed defaults should not set 'hv-evmcs' if it results in invalid
> >> > feature set.    
> >> 
> >> This is problematic as it is still sane for everyone to enable it as it
> >> gives performance advantage. If we just for a second forget about custom  
> >   "  
> >     > >> So a sane user will do:
> >     > >> 
> >     > >> '-cpu host,hv-default,vmx=off'  
> >   "
> >   it's not easy picking defaults.
> >  
> >> parsers and all that -- which is just an implementation detail, why can't
> >> we tie 'hv-evmcs' bit in 'hv-default' to 'vxm' 1:1?  
> > migration wise I don't see issues wrt vmx=off turning of hv-evmcs,
> > however ...
> >
> > we were replacing user input fixups with hard errors asking
> > user to fix CLI and removing custom parsers in favor of generic ones.
> >
> > In vmx=off case we would be fixing up what 'hv-default' explicitly set.
> > Same applies to other hv-foo set by hv-default.
> >
> > ex: 'hv-default,hv-dep1=off', will turn off some dependent feature
> > for other hv feature in hv-default set and it will error out,
> > same goes on for enabling feature that has dependencies.
> > Why should we treat hv-evmcs/vmx pair any different?
> >  
> 
> VMX is not part of Hyper-V enlightenments, is it? It can also be coming
> from a CPU model:
> 
> "-cpu MyLovelyModelWithVmx,hv-default"
> 
> should not throw an error!
> 
> Again, the goal is for userspace to not know anything besides
> 'hv-default' for Hyper-V enlightenments.
> 
> 
> > Granted exiting with error is not the best UX, but at least it says to user
> > what's wrong with CLI and how to fix it. Also it lets to keep QEMU code
> > manageable and with consistent behavior.  
> 
> Enabling EVMCS only when guest CPU has VMX is a smart behavior, all
> users want that. It is very consistent with how genuine Hyper-V behaves.
it's all good,
unless VMX is absent for whatever reasons (cpumodel or vmx=off on CLI),
in this case just error out and say what's wrong instead of trying to fix
CLI up.

> >> Again, the end goal is: make it possible for upper layers to now know
> >> anything about Hyper-V enlightenments other than 'hv-default'.  
> > I'm still doubtful about feasibility of this goal when migration is 
> > considered.
> > It sure would work if hosts are identical hw/sw wise.
> > In mixed setup all features, except of hv-evmcs, that included in 
> > 'hv-default',
> > will error out in case host doesn't support it, which should prevent
> > incompatible migration so that's also fine.
> >
> > But hv-evmcs will silently go away if host doesn't support it,
> > which is issue when migration happens to/from host that supports it.  
> 
> But how can it go away? In KVM, hv-evmcs support is not conditional,
> basically, all KVMs which support netsting state migration also support
> EVMCS. 

you added that in kernel between 4.19-4.20 (8cab6507f64),
so on Intel host flag can change depending on not so old kernel version.

Unless QEMU minimum supported kernel version is 4.20,
we can't ignore that.
(downstreams will have to take care of it on its own)

> >
> > Maybe to help mgmt to figure out hosts compatibility
> >   1. it should know about hv-evmcs to query it's status
> >   2. or default value set by 'hv-default' should be exposed to mgmt
> >      so it could compare whole feature-set in one go without being
> >      aware of individual features.
> >
> > Additionally on QEMU side for such conditional features we can
> > theoretically add a subsection to migration stream when feature
> > is enabled, that way we at least can prevent 'successful' migration,
> > when destination value doesn't match. But this might already be
> > over-engineering on my part.  
> 
> I think you're trying to solve an issue which doesn't exist. In case
> we're successfully migrating a nested enabled guest, our KVM is modern
> enough and supports EVMCS (on Intel, of course). Also, nested state
> (which is part of the migration stream) has EVMCS flag, we can't migrate
> somewhere where the flag is unsupported.
that's what I was missing.
Can you point out to the code that makes sure that migration fails?
(a comment where hv-evcms is added to default set explaining why it's safe, pls)

 
> Anyway, I feel we're walking in circles. I'm ready to just drop all
> EVMCS related bits from this seies to get it merged. This will make the
> part which you hate the most ("custom parsers") go away too. We can discuss
> EVMCS to death after that and, when we finally decide that user
> convenience is actually worth something, we can add 'hv-evmcs' to the
> new 

fine with me, it would be even easier to review if it were just a patch that
would add 'hv-defaults', without non must have refactoring.
(cleanup/refactoring could be another series)

If we can guarantee that hv-evcms won't flip on/off on all supported kernels,
I'm also fine with keeping it in default set and error-ing out if vmx ends up
in off state.
However if it changes, we need to expose 'default set' to mgmt somehow,
so it will know that hosts aren't compatible (instead of finding out it hard way
in form of failed migration (assuming it fails))

> Eduardo, do you see any way forward here? 
>
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [PATCH v4 16/21] i386: track explicit 'hv-*' features enablement/disablement, Igor Mammedov <=
- Re: [PATCH v4 16/21] i386: track explicit 'hv-*' features enablement/disablement, Vitaly Kuznetsov, 2021/03/01
Prev by Date: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing
Next by Date: [PATCH] pcnet: switch to use qemu_receive_packet() for loopback
Previous by thread: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing
Next by thread: Re: [PATCH v4 16/21] i386: track explicit 'hv-*' features enablement/disablement
Index(es):
- Date
- Thread