[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191121142136.GB18443@pauld.bos.csb>
Date: Thu, 21 Nov 2019 09:21:36 -0500
From: Phil Auld <pauld@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Dave Chinner <david@...morbit.com>, Ming Lei <ming.lei@...hat.com>,
linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
Jeff Moyer <jmoyer@...hat.com>,
Dave Chinner <dchinner@...hat.com>,
Eric Sandeen <sandeen@...hat.com>,
Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
Ingo Molnar <mingo@...hat.com>, Tejun Heo <tj@...nel.org>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: single aio thread is migrated crazily by scheduler
On Thu, Nov 21, 2019 at 02:29:37PM +0100 Peter Zijlstra wrote:
> On Wed, Nov 20, 2019 at 05:03:13PM -0500, Phil Auld wrote:
> > On Wed, Nov 20, 2019 at 08:16:36PM +0100 Peter Zijlstra wrote:
> > > On Tue, Nov 19, 2019 at 07:40:54AM +1100, Dave Chinner wrote:
>
> > > > Yes, that's precisely the problem - work is queued, by default, on a
> > > > specific CPU and it will wait for a kworker that is pinned to that
> > >
> > > I'm thinking the problem is that it doesn't wait. If it went and waited
> > > for it, active balance wouldn't be needed, that only works on active
> > > tasks.
> >
> > Since this is AIO I wonder if it should queue_work on a nearby cpu by
> > default instead of unbound.
>
> The thing seems to be that 'unbound' is in fact 'bound'. Maybe we should
> fix that. If the load-balancer were allowed to move the kworker around
> when it didn't get time to run, that would probably be a better
> solution.
>
Yeah, I'm not convinced this is actually a scheduler issue.
> Picking another 'bound' cpu by random might create the same sort of
> problems in more complicated scenarios.
>
> TJ, ISTR there used to be actually unbound kworkers, what happened to
> those? or am I misremembering things.
>
> > > Lastly,
> > > one other thing to try is -next. Vincent reworked the load-balancer
> > > quite a bit.
> > >
> >
> > I've tried it with the lb patch series. I get basically the same results.
> > With the high granularity settings I get 3700 migrations for the 30
> > second run at 4k. Of those about 3200 are active balance on stock 5.4-rc7.
> > With the lb patches it's 3500 and 3000, a slight drop.
>
> Thanks for testing that. I didn't expect miracles, but it is good to
> verify.
>
> > Using the default granularity settings 50 and 22 for stock and 250 and 25.
> > So a few more total migrations with the lb patches but about the same active.
>
> Right, so the granularity thing interacts with the load-balance period.
> By pushing it up, as some people appear to do, makes it so that what
> might be a temporal imablance is perceived as a persitent imbalance.
>
> Tying the load-balance period to the gramularity is something we could
> consider, but then I'm sure, we'll get other people complaining the
> doesn't balance quick enough anymore.
>
Thanks. These are old tuned settings that have been carried along. They may
not be right for newer kernels anyway.
--
Powered by blists - more mailing lists