linux-kernel - Re: single aio thread is migrated crazily by scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191120191636.GI4097@hirez.programming.kicks-ass.net>
Date:   Wed, 20 Nov 2019 20:16:36 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Dave Chinner <david@...morbit.com>
Cc:     Ming Lei <ming.lei@...hat.com>, linux-block@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-xfs@...r.kernel.org,
        linux-kernel@...r.kernel.org, Jeff Moyer <jmoyer@...hat.com>,
        Dave Chinner <dchinner@...hat.com>,
        Eric Sandeen <sandeen@...hat.com>,
        Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
        Ingo Molnar <mingo@...hat.com>, Tejun Heo <tj@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: single aio thread is migrated crazily by scheduler

On Tue, Nov 19, 2019 at 07:40:54AM +1100, Dave Chinner wrote:
> On Mon, Nov 18, 2019 at 10:21:21AM +0100, Peter Zijlstra wrote:

> > We typically only fall back to the active balancer when there is
> > (persistent) imbalance and we fail to migrate anything else (of
> > substance).
> > 
> > The tuning mentioned has the effect of less frequent scheduling, IOW,
> > leaving (short) tasks on the runqueue longer. This obviously means the
> > load-balancer will have a bigger chance of seeing them.
> > 
> > Now; it's been a while since I looked at the workqueue code but one
> > possible explanation would be if the kworker that picks up the work item
> > is pinned. That would make it runnable but not migratable, the exact
> > situation in which we'll end up shooting the current task with active
> > balance.
> 
> Yes, that's precisely the problem - work is queued, by default, on a
> specific CPU and it will wait for a kworker that is pinned to that

I'm thinking the problem is that it doesn't wait. If it went and waited
for it, active balance wouldn't be needed, that only works on active
tasks.

> specific CPU to dispatch it. We've already tested that queuing on a
> different CPU (via queue_work_on()) makes the problem largely go
> away as the work is not longer queued behind the long running fio
> task.
> 
> This, however, is not at viable solution to the problem. The pattern
> of a long running process queuing small pieces of individual work
> for processing in a separate context is pretty common...

Right, but you're putting the scheduler in a bind. By overloading the
CPU and only allowing the one task to migrate, it pretty much has no
choice left.

Anyway, I'm still going to have try and reproduce -- I got side-tracked
into a crashing bug, I'll hopefully get back to this tomorrow. Lastly,
one other thing to try is -next. Vincent reworked the load-balancer
quite a bit.