Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool

From:	Anthony Liguori
Subject:	Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Date:	Fri, 12 Dec 2008 11:25:55 -0600
User-agent:	Thunderbird 2.0.0.17 (X11/20080925)

Andrea Arcangeli wrote:

On Fri, Dec 12, 2008 at 10:49:45AM -0600, Anthony Liguori wrote:

I meant, if you wanted to pass a file descriptor as a raw device.  So:

qemu -hda raw:fd=4

Or something like that.  We don't support this today.


ah ok.

I think bouncing the iov and just using pread/pwrite may be our best bet.It means memory allocation but we can cap it. Since we're using threads,


It's already capped. However currently it generates an iovec, but
we've simply to check the iovcnt to be 1, if it's 1 we pread from
iov.iov_base, iov.iov_len. The dma api will take care to enforce
iovcnt to be 1 for the iovec if preadv/pwritev isn't detected at
compile time.

Hrm, that's more complex than I was expecting. I was thinking the bdrvaio infrastructure would always take an iovec. Any details about theunderlying host's ability to handle the iovec would be insulated.

we just can force a thread to sleep until memory becomes available so it'sactually pretty straight forward.
There's no way to detect that and wait for memory,


If we artificially cap at say 50MB, then you do something like:

while (buffer == NULL) {
  buffer = try_to_bounce(offset, iov, iovcnt, &size);
  if (buffer == NULL && errno == ENOMEM) {
     pthread_wait_cond(more memory);
  }
}

try_to_bounce allocs with malloc() but if you exceed 50MB, then you failwith an error of ENOMEM. In your bounce_free() function, you do apthread_cond_broadcast() to wake up any threads potentially waiting toallocate memory.

This lets us expose a preadv/pwritev function that actually works. Theexpectation is that bouncing will outperform just doing pread/pwrite ofeach vector. Of course, you could get smart and if try_to_bounce fail,fall back to pread/pwrite each vector. Likewise, you can fast-path thecase of a single iovec to avoid bouncing entirely.


Regards,

Anthony Liguori

 it'd sigkill before
you can check... at least with the default overcommit. The way the dma
api works, is that it doesn't send a mega large writev, but send it in
pieces capped by the max buffer size, with many iovecs with iovcnt = 1.

We can use libaio on older Linux's to simulate preadv/pwritev. Use theproper syscalls on newer kernels, on BSDs, and bounce everything else.


Given READV/WRITEV aren't available in not very recent kernels and
given that without O_DIRECT each iocb will become synchronous, we
can't use the libaio. Also once they fix linux-aio, if we do that, the
iocb logic would need to be largely refactored. So I'm not sure if it
worth it as it can't handle 2.6.16-18 when O_DIRECT is disabled (when
O_DIRECT is enabled we could just build an array of linear iocb).

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool, (continued)
- Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool, Ian Jackson, 2008/12/17

Prev by Date: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Next by Date: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Previous by thread: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Next by thread: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Index(es):
- Date
- Thread