qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 4/5] 9pfs: T_readdir latency optimization


From: Greg Kurz
Subject: Re: [PATCH v6 4/5] 9pfs: T_readdir latency optimization
Date: Fri, 3 Jul 2020 17:53:15 +0200

On Thu, 02 Jul 2020 19:23:35 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Donnerstag, 2. Juli 2020 17:35:00 CEST Greg Kurz wrote:
> > > > It isn't readdir only, most requests span over multiple v9fs_co_*()
> > > > calls...> 
> > > Right, I know! And that's actually my root motivation to finally bring
> > > this
> > > patch set forward, since I am very aware that this patch set is just a
> > > small brick in the overall procecss of fixing similarly affected code
> > > portions. So yes, that's my plan to fix them with upcoming patch sets,
> > > too.
> > > 
> > > Having said that, could we probably try to make future reviews a bit more
> > > efficient at certain aspects? For instance it would help a lot if the
> > > patch
> > > set was reviewed entirely, and not stopped at the very first issue spotted
> > > and then suspended to ++version, as this creates large latencies in the
> > > overall review process?
> > 
> > Review of 9pfs is a best effort thing... I usually stop reviewing when I'm
> > fed up or when all the time I could reasonably invest is elapsed. Breaking
> > down the series in smaller patches is the usual improvement for that.
> 
> No need to defend, you do these things voluntarily. I am glad that you spend 
> time on this project at all. I would appreciate though if we could reduce the 
> overall time for a patch set a bit, and my suggestion trying to walk through 
> an entire set before re-posting a new version might indeed bring an 
> improvement without anybody having to invest more time, but rather less.
> 

I'll try to adjust but I think you should also try to split patches (like
you did eventually with the addition of patch 3).

> > > And likewise it would also help if review is prioritized on relevant
> > > behaviour aspects (concept, design) first before arguing about code style
> > > details like variable names or other details invisible to users.
> > 
> > I don't remember questioning the overall concept behind these changes
> > because it looks reasonable enough (even if I would appreciate to be
> > able to verify them with a working reproducer).
> 
> What exactly do you mean here with working reproducer?
> 

Some test program to be run in a real guest that gets improved performance
with this patch set for example.

> > Even if it is invisible to the users, coding style or how a series is
> > broken down in patches is important for developers.
> 
> Dedication for detail beyond a certain fine graded level is luxury, and for 
> that reason it is designated for projects which clearly can afford it. On 
> (sub)projects with low activity and high latency (like this one) it easily 
> leads to frustration, which is obviously contra productive.
> 

The only way to avoid frustration when things don't go to your pace
is to take control, as you decided to do :)

> I don't say don't care about code style, details et al., but caring a bit 
> less 
> on reviews would not hurt, so to say. A bit of (invisible) tolerance avoids 
> friction and stagnancy.
> 

No, this friction and stagnancy is essentially the result of not sharing
the same agenda and different tastes also.

> > > And finally: compromises. As man power on 9p is very limited, it would
> > > make
> > > sense to limit patch sets at a certain extent and agree to postpone
> > > certain
> > > non-critical issues to subsequent patches (or sets of), otherwise a patch
> > > set will grow and grow and it will take ages to get forward.
> > 
> > FWIW this patchset was more than 10 patches and now it is only 5 :)
> 
> Hey, no misleading number crunching please. ;-) Three dropped patches were 
> never intended to be merged, they were pure benchmarks.
> 

Heh :)

> > The good news is that you'll soon be able to merge things by yourself.
> > I'll help you when I can but I won't be the gating factor anymore.
> > 
> > Hurrah !
> 
> Yep, I'll do my best to pursue your work, I appreciate what you did and that 
> you still volunteer to help!
> 

And if I'm to picky on reviews, feel free to ignore my remarks ;)

> > > > > > 2) basically means moving some logic from the frontend to the
> > > > > > backend, ie. called from v9fs_co_run_in_worker(). This implies
> > > > > > that a very long request (ie. one that would span on many calls
> > > > > > to readdir()) cannot be interrupted by the client anymore, as
> > > > > > opposed to what we have now BTW.
> > > > 
> > > > ... most likely to allow clients to interrupt a long request with a
> > > > Tflush. Have you considered this aspect in your changes ?
> > > 
> > > In this particular patch set, no, as for readdir this should not be an
> > > issue in practice for anybody. After this patch set is applied, even on
> > > huge directories, the fs driver's duration for readdir would barely be
> > > measurable. In fact the server's latency would always be much higher,
> > > yet.
> > 
> > Reproducer ? Numbers ? ;)
> 
> Well, you have already seen and run the benchmark in this series yourself. 
> You 
> can easily hit several hundred ms server latency, which is very hard to top 
> with any fs driver reading a directory. Ok, maybe with a tape drive you 
> could, 
> but that's pretty much it. :) With a 'normal' system the fs driver would 

Or some network file system on an extremely slow connection...

> rather block for something between <1ms .. 8ms, i.e. fs driver completes 
> before it would be able to actually see a flush request.
> 
> But to make it also clear: it would not bite with the new design either. If 
> someone really would see a necessity to abort readdir requests, that would be 
> possible without huge changes.
> 

Tflush requests are handled in the top-half exclusively by design, ie.
they rely on the worker thread handling the targeted request to yield
control back to the main I/O thread. Since this doesn't happen anymore
with your patches, I'm not sure how you be able to abort a _many readdir_ 
request once it's been fired into a worker thread.

A possible solution could be to break down a Treaddir into multiple
_many readdirs_ jobs, and have a knob or some logic to control the
latency ratio.

> > > > > > Then the whole seekdir/rewinddir/telldir/readdir sequence should
> > > > > > be called with a single v9fs_co_run_in_worker() invocation, in
> > > > > > which case the locking could just be something like:
> > > > > > 
> > > > > > int coroutine_fn v9fs_co_readdir_many(V9fsPDU *pdu, V9fsFidState
> > > > > > *fidp,
> > > > > > 
> > > > > >                                       struct V9fsDirEnt **entries,
> > > > > >                                       int32_t maxsize, bool dostat)
> > > > > > 
> > > > > > {
> > > > > > 
> > > > > >     int err = 0;
> > > > > >     
> > > > > >     if (v9fs_request_cancelled(pdu)) {
> > > > > >     
> > > > > >         return -EINTR;
> > > > > >     
> > > > > >     }
> > > > > >     
> > > > > >     v9fs_readdir_lock(&fidp->fs.dir);
> > > > > >     
> > > > > >     v9fs_co_run_in_worker({
> > > > > >     
> > > > > >         err = do_readdir_many(pdu, fidp, entries, maxsize, dostat);
> > > > > >     
> > > > > >     });
> > > > > >     
> > > > > >     v9fs_readdir_unlock(&fidp->fs.dir);
> > > > > >     return err;
> > > > > > 
> > > > > > }
> > > > > 
> > > > > That's exactly what should be prevented. The lock must be on driver
> > > > > thread
> > > > > side, not on main thread side. The goal is to reduce the time slice of
> > > > > individual locks. If the lock is on main thread, the time slice of a
> > > > > lock
> > > > > (even on huge directories with thousands of entries) is multiple
> > > > > factors
> > > > > larger than if the lock is on driver thread side. So that's not just
> > > > > few
> > > > > percent or so, it is huge.
> > > > 
> > > > Sorry I don't get it...  In a contention-less situation, which is the
> > > > case we really care for, qemu_co_mutex_lock() has no overhead.
> > > 
> > > Yes, it only kicks in if there is concurrency.
> > 
> > And we don't care to preserve performance for silly clients, do we ?
> 
> We don't necessarily need to preserve performance for them, that's right. But 
> it is already handled appropriately for them, so destroying it (i.e. slowing 
> them down) only makes sense if there is a good reason for that.
> 

Ending up with a mix of two different kind of mutexes for 9p2000.L and .u is
a good enough reason for me.

> > > > > > This would technically leave the locking in the main I/O thread,
> > > > > > but it isn't conceptually different from locking at the beginning
> > > > > > of do_readdir_lock() and unlocking before returning. My concern
> > > > > > is that I don't know how coroutine mutexes behave with multiple
> > > > > > worker threads...
> > > > > 
> > > > > Well, your Mutex -> CoMutex change was intended to fix a deadlock with
> > > > > the
> > > > > old readdir implementation. That deadlock could not happen with the
> > > > > new
> > > > > readdir implementation, which suggests that it probably makes sense to
> > > > > revert this change (i.e. CoMutex -> Mutex) for this new
> > > > > implementation.
> > > > 
> > > > No we can't because it is also used with 9p2000.u, that you said you
> > > > don't want to fix.
> > > 
> > > Yes and no, I did not say I don't want to fix readdir for 9p2000.u at all.
> > > What I said was I want to prioritize and concentrate on the most relevant
> > > issues first. 9p2000.L is the most commonly used protocol variant, so I
> > > would like to fix the most severe (e.g. performance) issues for 9p2000.L
> > > first before spending time on 9p2000.u which is AFAICS barely used in
> > > comparison, which I personally don't use at all, and which I am hence not
> > > testing in the same degree and cannot serve with the same quality as
> > > 9p2000.L right now.
> > > 
> > > Plus if there are really users caring for 9p2000.u, I can gladly assist
> > > them to address these issues for 9p2000.u as well, provided that they
> > > help at least with testing the required 9p2000.u changes.
> > 
> > I would personally do the opposite... again ;)
> > 
> > Like you say we essentially care for 9p2000.L and we only do limited
> > support for 9p2000.u. If we have a change that we really want for
> > 9p2000.L but it has an impact on 9p2000.u because of shared code,
> > it is fine to do the changes anyway, including changes to the 9p2000.u
> > specific code. Very basic testing of 9p2000.u (mount/ls or find/umount)
> > or maybe running pjd-fstest is enough IMHO. If we break someone's setup
> > and he complains, then we can ask him to test a fix before we merge it.
> 
> Well, so you want to handle the relevant 9p2000.u readdir optimizations by 
> yourself, and you would send it as follow-up patch set (after this set is 
> through), including test cases?
> 

Ah it wasn't my point. I was just saying that any valuable change for
9p2000.L prevails and if you have to change some common code (eg.
locking) that could impact the 9p2000.u experience, you can do it
anyway, even if you only do smoke testing with 9p2000.u.

> > > Back to the actual topic: so what do we do about the mutex then? CoMutex
> > > for 9p2000.u and Mutex for 9p2000.L? I know you find that ugly, but it
> > > would just be a transitional measure.
> > 
> > That would ruin my day...
> > 
> > More seriously, the recent fix for a deadlock condition that was present
> > for years proves that nobody seems to be using silly clients with QEMU.
> > So I think we should just dump the lock and add a one-time warning in
> > the top level handlers when we detect a duplicate readdir request on
> > the same fid. This should be a very simple patch (I can take care of
> > it right away).
> 
> Looks like we have a consensus! Then I wait for your patch removing the lock, 
> and for your feedback whether or not you see anything else in this patch set?
> 

I was ready to do that but now I'm reading you other mail... I'll
continue the discussion there.

> > > Best regards,
> > > Christian Schoenebeck
> > 
> > Cheers,
> > 
> > --
> > Greg
> 
> Best regards,
> Christian Schoenebeck
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]