linux-kernel - Re: single aio thread is migrated crazily by scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191121142136.GB18443@pauld.bos.csb>
Date:   Thu, 21 Nov 2019 09:21:36 -0500
From:   Phil Auld <pauld@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Dave Chinner <david@...morbit.com>, Ming Lei <ming.lei@...hat.com>,
        linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        Jeff Moyer <jmoyer@...hat.com>,
        Dave Chinner <dchinner@...hat.com>,
        Eric Sandeen <sandeen@...hat.com>,
        Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
        Ingo Molnar <mingo@...hat.com>, Tejun Heo <tj@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: single aio thread is migrated crazily by scheduler

On Thu, Nov 21, 2019 at 02:29:37PM +0100 Peter Zijlstra wrote:
> On Wed, Nov 20, 2019 at 05:03:13PM -0500, Phil Auld wrote:
> > On Wed, Nov 20, 2019 at 08:16:36PM +0100 Peter Zijlstra wrote:
> > > On Tue, Nov 19, 2019 at 07:40:54AM +1100, Dave Chinner wrote:
> 
> > > > Yes, that's precisely the problem - work is queued, by default, on a
> > > > specific CPU and it will wait for a kworker that is pinned to that
> > > 
> > > I'm thinking the problem is that it doesn't wait. If it went and waited
> > > for it, active balance wouldn't be needed, that only works on active
> > > tasks.
> > 
> > Since this is AIO I wonder if it should queue_work on a nearby cpu by 
> > default instead of unbound.  
> 
> The thing seems to be that 'unbound' is in fact 'bound'. Maybe we should
> fix that. If the load-balancer were allowed to move the kworker around
> when it didn't get time to run, that would probably be a better
> solution.
> 

Yeah, I'm not convinced this is actually a scheduler issue.


> Picking another 'bound' cpu by random might create the same sort of
> problems in more complicated scenarios.
> 
> TJ, ISTR there used to be actually unbound kworkers, what happened to
> those? or am I misremembering things.
> 
> > > Lastly,
> > > one other thing to try is -next. Vincent reworked the load-balancer
> > > quite a bit.
> > > 
> > 
> > I've tried it with the lb patch series. I get basically the same results.
> > With the high granularity settings I get 3700 migrations for the 30 
> > second run at 4k. Of those about 3200 are active balance on stock 5.4-rc7.
> > With the lb patches it's 3500 and 3000, a slight drop. 
> 
> Thanks for testing that. I didn't expect miracles, but it is good to
> verify.
> 
> > Using the default granularity settings 50 and 22 for stock and 250 and 25.
> > So a few more total migrations with the lb patches but about the same active.
> 
> Right, so the granularity thing interacts with the load-balance period.
> By pushing it up, as some people appear to do, makes it so that what
> might be a temporal imablance is perceived as a persitent imbalance.
> 
> Tying the load-balance period to the gramularity is something we could
> consider, but then I'm sure, we'll get other people complaining the
> doesn't balance quick enough anymore.
> 

Thanks. These are old tuned settings that have been carried along. They may
not be right for newer kernels anyway. 


--