qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH -V3 09/32] virtio-9p: Implement P9_TWRITE/ Threa


From: Avi Kivity
Subject: Re: [Qemu-devel] [PATCH -V3 09/32] virtio-9p: Implement P9_TWRITE/ Thread model in QEMU
Date: Tue, 30 Mar 2010 13:24:03 +0300
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100301 Fedora/3.0.3-1.fc12 Thunderbird/3.0.3

On 03/30/2010 12:23 AM, Anthony Liguori wrote:
It's not sufficient. If you have a single thread that runs both live migrations and timers, then timers will be backlogged behind live migration, or you'll have to yield often. This is regardless of the locking model (and of course having threads without fixing the locking is insufficient as well, live migration accesses guest memory so it needs the big qemu lock).

But what's the solution? Sending every timer in a separate thread? We'll hit the same problem if we implement an arbitrary limit to number of threads.
A completion that's expected to take a couple of microseconds at most 
can live in the iothread.  A completion that's expected to take a couple 
of milliseconds wants its own thread.  We'll have to think about 
anything in between.
vnc and migration can perform large amounts of work in a single 
completion; they're limited only by the socket send rate and our 
internal rate-limiting which are both outside our control.  Most device 
timers are O(1).  virtio completions probably fall into the annoying 
"have to think about it" department.
What I'm skeptical of, is whether converting virtio-9p or qcow2 to handle each request in a separate thread is really going to improve things.
Currently qcow2 isn't even fullly asynchronous, so it can't fail to 
improve things.
Unless it introduces more data corruptions which is my concern with 
any significant change to qcow2.
It's possible to move qcow2 to a thread without any significant change 
to it (simply run the current code in its own thread, protected by a 
mutex).  Further changes would be very incremental.
The VNC server is another area that I think multithreading would be a bad idea.
If the vnc server is stuffing a few megabytes of screen into a 
socket, then timers will be delayed behind it, unless you litter the 
code with calls to bottom halves.  Even worse if it does complicated 
compression and encryption.
Sticking the VNC server in it's own thread would be fine.  Trying to 
make the VNC server multithreaded though would be problematic.
Why would it be problematic?  Each client gets its own threads, they 
don't interact at all do they?
I don't see a need to do it though (beyond dropping it into a thread).

Basically, sticking isolated components in a single thread should be pretty reasonable.
Now you're doomed.  It's easy to declare things "isolated components" 
one by one, pretty soon the main loop will be gone.
But if those system calls are blocking, you need a thread?
You can dispatch just the system call to a thread pool.  The 
advantage of doing that is that you don't need to worry about 
locking since the system calls are not (usually) handling shared state.
There is always implied shared state.  If you're doing direct guest 
memory access, you need to lock memory against hotunplug, or the 
syscall will end up writing into freed memory.  If the device can be 
hotunplugged, you need to make sure all threads have returned before 
unplugging it.
There are other ways to handle hot unplug (like reference counting) 
that avoid this problem.
That's just more clever locking.

Ultimately, this comes down to a question of lock granularity and thread granularity. I don't think it's a good idea to start with the assumption that we want extremely fine granularity. There's certainly very low hanging fruit with respect to threading.
Sure.  Currently the hotspots are block devices (except raw) and hpet 
(seen with large Windows guests).  The latter includes the bus lookup 
and hpet itself, hpet reads can be performed locklessly if we're clever.
On a philosophical note, threads may be easier to model complex hardware that includes a processor, for example our scsi card (and how about using tcg as a jit to boost it :)
Yeah, it's hard to argue that script evaluation shouldn't be done in 
a thread.  But that doesn't prevent me from being very cautious 
about how and where we use threading :-)
Caution where threads are involved is a good thing.  They are 
inevitable however, IMO.
We already are using threads so they aren't just inevitable, they're 
reality.  I still don't think using threads would significantly 
simplify virtio-9p.
I meant, exposing qemu core to the threads instead of pretending they 
aren't there.  I'm not familiar with 9p so don't hold much of an 
opinion, but didn't you say you need threads in order to handle async 
syscalls?  That may not be the deep threading we're discussing here.
btw, IIUC currently disk hotunplug will stall a guest, no?  We need 
async aio_flush().
--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]