qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KI


From: Johannes Berg
Subject: Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
Date: Fri, 06 Sep 2019 17:32:02 +0200
User-agent: Evolution 3.30.5 (3.30.5-1.fc29)

Hi,

> Oh. Apparently qemu mailman chose this time to kick me out
> of list subscription (too many bounces or something?)
> so I didn't see it.

D'oh. Well, it's really my mistake, I should've CC'ed you.

> What worries me is the load this places on the socket.
> ATM if socket buffer is full qemu locks up, so we
> need to be careful not to send too many messages.

Right, sure. I really don't think you ever want to use this extension in
a "normal VM" use case. :-)

I think the only use for this extension would be for simulation
purposes, and even then only combined with the REPLY_ACK and SLAVE_REQ
extensions, i.e. you explicitly *want* your virtual machine to lock up /
wait for a response to the KICK command (and respectively, the device to
wait for a response to the CALL command).

Note that this is basically its sole purpose: ensuring exactly this
synchronisation! Yes, it's bad for speed, but it's needed in simulation
when time isn't "real".

Let me try to explain again, most likely my previous explanation was too
long winded. WLOG, I'll focus on the "kick" use case, the "call" is the
same, just the other way around. I'm sure you know that the call is
asynchronous, i.e. the VM will increment the eventfd counter, and
"eventually" it becomes readable to the device. Now the device does
something (as fast as it can, presumably) and returns the buffer to the
VM.

Now, imagine you're running in simulation time, i.e. "time travel" mode.
Briefly, this hacks the idle loop of the (UML) VM to just skip forward
when there's nothing to do, i.e. if you have a timer firing in 100ms and
get to idle, time is immediately incremented by 100ms and the timer
fires. For a single VM/device this is already implemented in UML, and
while it's already very useful that's only half the story to me.

Once you have multiple devices and/or VMs, you basically have to keep a
"simulation calendar" where each participant (VM/device) can put an
entry, and then whenever they become idle they don't immediately move
time forward, but instead ask the calendar what's next, and the calendar
determines who runs.

Now, for these simulation cases, consider vhost-user again. It's
absolutely necessary that the calendar is updated all the time, and the
asynchronous nature of the call breaks that - the device cannot update
the calendar to put an event there to process the call message.

With this extension, the device would work in the following way. Assume
that the device is idle, and waiting for the simulation calendar to tell
it to run. Now,

 1) it has an incoming call (message) from VM (which waits for reply)
 2) the device will now put a new event on the simulation scheduler for
    a time slot to process the message
 3) return reply to VM
 4) device goes back to sleep - this stuff was asynchronously handled
    outside of the simulation basically.

In a sense, the code that just ran isn't considered part of the
simulated device, it's just the transport protocol and part of the
simulation environment.

At this point, the device is still waiting for its calendar event to be
triggered, but now it has a new one to process the message. Now, once
the VM goes to sleep, the scheduler will check the calendar and
presumably tell the device to run, which runs and processes the message.
This repeats for as long as the simulation runs, going both ways (or
multiple ways if there are more than 2 participants).


Now, what if you didn't have this synchronisation, ie. we don't have
this extension or we don't have REPLY_ACK or whatnot?

In that case, after the step 1 above, the VM will immediately continue
running. Let's say it'll wait for a response from the device for a few
hundred milliseconds (of now simulated time). However, depending on the
scheduling, the device has quite likely not yet put the new event on the
simulation calendar (that happens in step 2 above). This means that the
VM's calendar event to wake it up after a few hundred milliseconds will
immediately trigger, and the simulation ends with the driver getting a
timeout from the device.


So - yes, while I understand your concern, I basically think this is not
something anyone will want to use outside of such simulations. OTOH,
there are various use cases (I'm doing device simulation, others are
doing network simulation) that use such a behaviour, and it might be
nice to support it in a more standard way, rather than everyone having
their own local hacks for everything, like e.g. the VMSimInt paper(**).


But again, like I said, no hard feelings if you think such simulation
has no place in upstream vhost-user.


(**) I put a copy of their qemu changes on top of 1.6.0 here:
     https://p.sipsolutions.net/af9a68ded948c07e.txt

johannes




reply via email to

[Prev in Thread] Current Thread [Next in Thread]