qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 10/11] 9pfs: T_readdir latency optimization


From: Christian Schoenebeck
Subject: Re: [PATCH v4 10/11] 9pfs: T_readdir latency optimization
Date: Thu, 23 Jan 2020 13:57:23 +0100

On Donnerstag, 23. Januar 2020 12:33:42 CET Greg Kurz wrote:
> On Tue, 21 Jan 2020 01:30:10 +0100
> 
> Christian Schoenebeck <address@hidden> wrote:
> > Make top half really top half and bottom half really bottom half:
> > 
> > Each T_readdir request handling is hopping between threads (main
> > I/O thread and background I/O driver threads) several times for
> > every individual directory entry, which sums up to huge latencies
> > for handling just a single T_readdir request.
> > 
> > Instead of doing that, collect now all required directory entries
> > (including all potentially required stat buffers for each entry) in
> > one rush on a background I/O thread from fs driver, then assemble
> > the entire resulting network response message for the readdir
> > request on main I/O thread. The fs driver is still aborting the
> > directory entry retrieval loop (on the background I/O thread) as
> > soon as it would exceed the client's requested maximum R_readdir
> > response size. So we should not have any performance penalty by
> > doing this.
> > 
> > Signed-off-by: Christian Schoenebeck <address@hidden>
> > ---
> 
> Ok so this is it. Not reviewed this huge patch yet but I could at
> least give a try. The gain is impressive indeed:

Tseses, so much scepticism. :)

> [greg@bahia qemu-9p]$ (cd .mbuild-$(stg branch)/obj ; export
> QTEST_QEMU_BINARY='x86_64-softmmu/qemu-system-x86_64'; make all
> tests/qtest/qos-test && for i in {1..100}; do tests/qtest/qos-test -p
> $(tests/qtest/qos-test -l | grep readdir/basic); done) |& awk '/IMPORTANT/
> { print $10 }' | sed -e 's/s//' -e 's/^/n+=1;x+=/;$ascale=6;x/n' | bc
> .009806
> 
> instead of .055654, i.e. nearly 6 times faster ! This sounds promising :)

Like mentioned in the other email, performance improvement by this patch is 
actually far more than factor 6 since you probably just dropped the n-square 
driver hack in your benchmarks (which tainted your benchmark results):

Unoptimized readdir, with n-square correction hack:
Time client spent for waiting for reply from server: 0.082539s [MOST 
IMPORTANT]

Optimized readdir, with n-square correction hack:
Time 9p server spent on synth_readdir() I/O only (synth driver): 0.001576s
Time 9p server spent on entire T_readdir request: 0.002244s [IMPORTANT]
Time client spent for waiting for reply from server: 0.002566s [MOST 
IMPORTANT]

So in this particular test run performance improvement by around factor 32, 
but I also observed factors around 40 before in my tests.

> Now I need to find time to do a decent review... :-\

Sure, take your time! But as you can see, it is really worth it.

And it's not just the performance improvement. This patch also reduces program 
flow complexity significantly, e.g. there is just one lock and one unlock; 
entry name allocation is immediately freed without any potential branch in 
between, and much more. In other words: it adds safety.

Best regards,
Christian Schoenebeck





reply via email to

[Prev in Thread] Current Thread [Next in Thread]