Re: [Question] Regarding containers "unattached/peripheral/anonymous" -

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Question] Regarding containers "unattached/peripheral/anonymous" -

From:	Igor Mammedov
Subject:	Re: [Question] Regarding containers "unattached/peripheral/anonymous" - their relation with hot(un)plug of devices
Date:	Mon, 27 Jan 2020 16:03:52 +0100

On Fri, 24 Jan 2020 18:44:16 +0000
Salil Mehta <address@hidden> wrote:

> > From: Igor Mammedov [mailto:address@hidden]
> > Sent: Friday, January 24, 2020 4:07 PM
> > 
> > On Fri, 24 Jan 2020 15:02:10 +0000
> > Salil Mehta <address@hidden> wrote:
> >   
> > > > From: Igor Mammedov [mailto:address@hidden]
> > > > Sent: Friday, January 24, 2020 1:54 PM
> > > > To: Salil Mehta <address@hidden>
> > > >
> > > > On Fri, 24 Jan 2020 11:20:15 +0000
> > > > Salil Mehta <address@hidden> wrote:
> > > >  
> > > > > Hello,
> > > > > I am working on vCPU Hotplug feature for ARM64 and I am in mid of 
> > > > > understanding
> > > > > some aspect of device_add/device_del interface of the QEMU.
> > > > >
> > > > > Observations:
> > > > > 1. Any object initialised by qmp_device_add() gets into 
> > > > > /machine/unattached
> > > > > container. I traced the flow to code leg inside  device_set_realized()
> > > > > 2. I could see the reverse qmp_device_del() expects the device to be 
> > > > > in
> > > > > /machine/peripheral container.
> > > > > 3. I could see any object initially added to unattached container did 
> > > > > not had
> > > > > their parents until object_add_property_child() was called further in 
> > > > > the leg.
> > > > > which effectively meant a new property was created and property table
> > > > > populated and child was parented.
> > > > > 4. Generally, container  /machine/peripheral was being used wherever
> > > > > DEVICE(dev)->id was present and non-null.
> > > > >
> > > > > Question:
> > > > > 1. Wanted to confirm my understanding about the use of having separate
> > > > > containers like unattached, peripheral and anonymous.  
> > > >  
> > > > > 2. At init time all the vcpus goes under *unattached* container. Now,
> > > > > qmp_device_del() cannot be used to unplug them. I am wondering  
> > > >
> > > > device is put into 'unattached' in case it wasn't assigned a parent.
> > > > Usually it happens when board creates device directly.  
> > >
> > >
> > > Sure, but if we decide that certain number(N) of vcpus are hotplugabble
> > > and certain subset of N (say 'n' < 'N') should be allowed to be present
> > > or cold-plugged at the init time then I wonder which of the following
> > > is correct approach:
> > >
> > > 1. Bring all of N vcpus at boot time under "peripheral" container
> > >                                    OR
> > > 2. Just bring subset 'n' of 'N' under "peripheral" container and rest
> > >     under "unattached" container? And later as and when rest of the
> > >     vcpus are hotplugged they should be transferred from "unattached"
> > >     container to "peripheral" container?  
> > 
> > issue with that is that to put device into "peripheral" container,
> > 'the user' must provide 'id'. (currently QEMU isn't able to do it on its own
> > [1])
> > 
> > But it doesn't mean that cold-plugged CPUs can't be unpluged.
> > What current users could do is start QEMU like this (simplified version):
> > 
> >  $QEMU -smp 1,maxcpus=N -device foo-cpu-type,id=CPU00 -device
> > foo-cpu-type,id=CPU01 ...
> > 
> > i.e. 1st CPU is not manageable due to lack if 'id' and is created by board 
> > code,
> > the rest have 'id' and could be managed.  
> 
> 
> I understand, that we can somehow assign ids from the QMP interface but
> above will not push vcpus into "peripheral" container. They will appear
> in "unattached" container but with specified names and as-far-as I can
> see in the code 'device_del' can only delete objects/devices from the
> 'peripheral' container?

qemu-system-x86_64 -monitor stdio \
    -smp 1,maxcpus=3 \
    -device qemu64-x86_64-cpu,id=foo,socket-id=1,core-id=0,thread-id=0 \
    -device qemu64-x86_64-cpu,socket-id=2,core-id=0,thread-id=0

(qemu) info hotpluggable-cpus 
Hotpluggable CPUs:
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  qom_path: "/machine/peripheral-anon/device[0]"
                      ^^^^^^^^^^^^^^^
  CPUInstance Properties:
    socket-id: "2"
    core-id: "0"
    thread-id: "0"
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  qom_path: "/machine/peripheral/foo"
                      ^^^^^^^^^^

in gist, if device is created with any variant of device_add,
it goes to "peripheral[-anon]" including cold-plugged one.

  CPUInstance Properties:
    socket-id: "1"
    core-id: "0"
    thread-id: "0"
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  qom_path: "/machine/unattached/device[0]"
  CPUInstance Properties:
    socket-id: "0"
    core-id: "0"
    thread-id: "0"





> 
> Plus, having those many ids specified for over large number of vcpus
> does not looks very practical solution. We need interface like auto
number of IDs is not a problem since it's usually handled by management
software just fine (ex: libvirt does it)

> Generation of ids even at the boot time. I could see from the link you
> have shared that it is already being used by ID_BLOCK subsystem. Can we
> create a new subsystem for cpus under this and do the auto Generation
> of vcpu ids as well?

I'm not sure that auto ids was actually merged.
(I thought it wasn't)

Anyway auto IDs are not directly related to enabling CPU hotplug for ARM,
if you feel they should be generated you can try to propose patches.

> > Question is:
> >   why you are looking into 'what container' is used for CPUs?  
> 
> 
> Idea is to be able to use 'device_del' interface to unplug vcpus
> both for the cold-plugged and hot-plugged cases using the standard
> 'device_add' interface.

As far as CPU devices in QEMU are created with help of -device/device_add
with 'id' provided it should work with current code.

 
> Plus, there is another unique requirement specifically for realizing
> vcpu hotplug for ARM64.
> 
> Summary:
> Right now ARM architecture does not allows reinitializing the GIC
> after VM has booted. Therefore, we are planning to pre-size the GIC
> interfaces at init time by initializing all of the possible vcpus
> and keep them in 'realized' but 'unavailable' state to the Guest
> VM. They shall be made available as-and-when vcpus are hot-plugged
> later-on. Therefore, current efforts are to be able to plug and
> unplug from the qemu QOM without destroying the existing state of
> the devices/objects representing vcpus. These all possible vcpus
> shall be created once at the boot time of the VM. The vcpus which
> are not available to the Guest VM can be Parked. 
>
> Once the vcpus are hot-(un)plug'ged only the (pre-)plug/unplug(-request)
> interfaces are used to convey this even information to the Guest
> VM.
> 
> I have tested this solution and it works but I wanted to make sure
> that I am not doing anything which breaks any of the existing Qemu
> QOM interfaces and basic fundamental idea behind being able to attach
> and detach from the Qemu QOM is okay?
> 
> Any suggestion are welcome in this regard?

>From discussion with Drew [CCed], I got that kvm/arm isn't designed
to support vCPU hotplug (and it would require heavy refactoring to
separate GIC and VCPUs, which probably won't be welcomed by maintainers).

But that's only KVM side of the equation. Assuming that we don't
touch KVM much, the only QEMU side is left.

Further lets call
 * vCPU - a kvm's part of CPU
 * CPU - QEMU object which is linked to vCPU via file descriptor.

In QEMU we have CPU devices which optionally might create vCPUs
during device realize time (if QEMU runs with KVM accelerator).

So from design point of view, I'd suggest to dynamically
create/remove CPU devices on demand using existing
 -device/device_add/device_del interface
like we do for other architectures.

But in case of running with KVM accelerator, to accommodate
current non dynamic ARM/KVM, I'd move vCPU creation to "kvm_init()"
time or board init time, so it would pre-create vCPUs in
KVM early in parked state and put them in 'kvm_parked_vcpus' list
but won't create CPU devices for them.

Then later when management adds CPU device either with
'-device' or 'device_add', a new CPU device will pick up
pre-created parked vCPU file descriptor and re-use it.

Parked vCPU infrastructure is already exists in QEMU as we use it
for hot-unplugged CPUs for the same reasons (it needs too much
refactoring on KVM side to really remove vCPU).

So when CPU is hot-unplugged, we put linked vCPU file descriptor
into kvm_parked_vcpus (see: kvm_destroy_vcpu) and completely delete
CPU device on QEMU side. Then when the same CPU is hot-plugged again,
we reuse previously parked vCPU file descriptor (see: kvm_get_vcpu).


> NOTE: I plan to share the patches with the community which includes
> both the changes of the Linux Kernel and the QEMU in near future.
> 
> 
> 
> > 1) here is what I could find on IDs topic
> >    https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00011.html
> >   
> > > > >    if all the hotplug devices need to go under the *peripheral* 
> > > > > container while
> > > > > they are hotplugged and during object init time as well?  
> > > >
> > > > theoretically device_del may use QOM path (the later users can get with
> > > > query-hotpluggable-cpus),
> > > > but I think it's mostly debugging feature.  
> > >
> > >
> > > Sure.
> > >
> > >  
> > > > users are supposed to specify 'id' during -device/device_add if they 
> > > > are going
> > > > to manage that device.
> > > > afterwards (like unplugging it). Then they could use that 'id' in other 
> > > > commands
> > > > (including device_del)
> > > >
> > > > So 'id'-ed devices end up in 'peripheral' container.  
> > >
> > >
> > > Sure, what if hotplugged device is removed and then added again? It looks
> > > qmp_device_add() interface will again end up calling the 
> > > device_set_realized()
> > > which eventually would put hotplugged devices under "unattached" 
> > > container?  
> > 
> > it won't, see call chain:
> > 
> >   qmp_device_add()  
> >       -> qdev_device_add()
> >           -> qdev_set_id()  
> 
> Ok, sure. I did see the qdev_set_id() interface. Infact, earlier I was 
> actually
> trying to play with it by making it more generic and adding even the 
> 'unattached'
> container handling to it(which is missing right now) and calling it inside the
> device_set_realized()  instead of below code:
> 
>         if (!obj->parent) {
>             gchar *name = g_strdup_printf("device[%d]", unattached_count++);
> 
>             object_property_add_child(container_get(qdev_get_machine(),
>                                                     "/unattached"),
>                                       name, obj, &error_abort);
>             unattached_parent = true;
>             g_free(name);
>         }
> 
> Idea of above dabbling was to have common interface for 'unattached' container
> and call it from virt.c from machvirt_init() where possible vcpus are being
> created. Force them to be located either inside 'unttached' or 'peripheral'
> container akin to the example you had given.
> 
> If we look at qdev_device_add() function, after setting dev->id using
> qdev_set_id() (which would also result in parenting of an object under
> 'peripheral' container), it calls the function to 'realize' the device/object
> which would end up in hitting above shared code excerpt and now because
> it will have the parent already set, the object won't go into 'unattached'
> container.
> 
> Currently, later cannot be controlled for the cold-lugged vcpus. Therefore,
> before this discussion my initial thought process was to either make
> qdev_set_id() universal (i.e. include handling for all type of containers
> unattached/peripheral/anonymous in qdev_set_id()). Then call it from
> machvirt_init() just before the vcpus have been 'realized' so that
> cold-plugged cpus could get into 'peripheral' container. This would help
> in hot-unplugging using the standard 'device_del' interface.

I think we understand cold-plugged vcpus differently.
>From QEMU point of view there are 2 kinds of cold-plugged CPU devices.

Ones that are currently created by board directly following pattern

 object_new()
 set properties
 relalize

these are created (lets call them builtin) in amount specified by -smp X
and are not manageable by external applications.

The second kind of cold-plugged CPUs are the ones created on command
line with help of:

 -device cpu-type-foo

these are created by management applications and could be hot-removed
if management supplied 'id'. For example libvirt starts qemu with 1
built-in cpu in paused mode (-s) QEMU, then it hotplugs via QMP N cpus
and lets VM run (it's essentially the same as using -device on CLI).
This way it can remove all CPUs except of 1st one which is good enough
in almost all the cases.

> Earlier, I was not sure if there was any special significance to the
> 'unattached' container - which by the discussion it looks there is not
> any and we could actually choose to place all of the cold-plugged vpus
> as well in the 'peripheral' container. Please correct me here if I
> have mis-understood anything here?
> 
> Now, assuming we are allowed to push all of the vcpus in the 'peripheral'
> container then I could simply call unmodified qdev_set_id() just before
> the 'realization' of the cold-plugged vcpus. And maybe we could generate
> the cpus-ids using auto generate function which has been mentioned in the
> link you shared and just like it is being done for the block devices.

In ARM case, I'd consider to implementing -device/device_add support first.
So it would be on par with other architectures in QEMU that support cpu hotplug.

Then try to implement id auto-generation on top and discuss if making
built-in CPUs hot-removable is worthwhile, as it's a separate issues
and could be done later.

Separating feature on distinct self-sufficient chunks usually makes
reviews/merging easier comparing to trying to do everything at once.

> Thoughts?
> 
> 
> > > > > 3. I could not see any device being place under *anonymous* container 
> > > > > during  
> > > > init time. What is the use of this container?
> > > >
> > > > if I recall it right, devices created with help of device_add but 
> > > > without  
> > 'id'  
> > > > go to this container  
> > >
> > >
> > > Any examples on top of your head where such an interface might be of use? 
> > >  
> > 
> > ex:
> > one could use -device/device_add without any ids if such devices aren't 
> > planned
> > to be unplugged during runtime or for unpluggable devices  
> 
> Ok. Thanks!
> 
> 
> Best Regards
> Salil
>

[Prev in Thread]

Current Thread

[Next in Thread]

[Question] Regarding containers "unattached/peripheral/anonymous" - their relation with hot(un)plug of devices, Salil Mehta, 2020/01/24
- Re: [Question] Regarding containers "unattached/peripheral/anonymous" - their relation with hot(un)plug of devices, Igor Mammedov, 2020/01/24
  - RE: [Question] Regarding containers "unattached/peripheral/anonymous" - their relation with hot(un)plug of devices, Salil Mehta, 2020/01/24
    - Re: [Question] Regarding containers "unattached/peripheral/anonymous" - their relation with hot(un)plug of devices, Igor Mammedov, 2020/01/24
    - RE: [Question] Regarding containers "unattached/peripheral/anonymous" - their relation with hot(un)plug of devices, Salil Mehta, 2020/01/24
    - Re: [Question] Regarding containers "unattached/peripheral/anonymous" - their relation with hot(un)plug of devices, Igor Mammedov <=

Prev by Date: Re: [PATCH 7/8] tests/vm: Added a new script for ubuntu.aarch64.
Next by Date: Re: [PATCH 2/2] migration/compress: disable compress if failed to setup
Previous by thread: RE: [Question] Regarding containers "unattached/peripheral/anonymous" - their relation with hot(un)plug of devices
Next by thread: Re: [PATCH 0/3] migration: add sztd compression
Index(es):
- Date
- Thread