linux-kernel - Re: [PATCH 1/4] sched/fair: make sure to try to detach at least one movable task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtCf3US76sLObK=iRNtid2hL8s-6Denha71F6W4J0MyAoA@mail.gmail.com>
Date: Fri, 22 Mar 2024 18:16:27 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Josh Don <joshdon@...gle.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, 
	mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com, 
	linux-kernel@...r.kernel.org, zhangqiao22@...wei.com
Subject: Re: [PATCH 1/4] sched/fair: make sure to try to detach at least one
 movable task

On Thu, 21 Mar 2024 at 21:25, Josh Don <joshdon@...gle.com> wrote:
>
> On Wed, Mar 20, 2024 at 9:58 AM Vincent Guittot
> <vincent.guittot@...aro.org> wrote:
> >
> > Hi Josh,
> >
> > Sorry for the late reply.
>
> No worries, responses to your comments inline below.
>
> > > We had a user recently trigger a hard lockup which we believe is due
> > > to this patch. The user in question had O(10k) threads affinitized to
> > > a cpu; seems like the process had an out of control thread spawning
> > > issue, and was in the middle of getting killed. However, that was
> > > being slowed down due to the fact that load balance was iterating all
> >
> > Does it mean that it was progressing but not as fast as you would like
>
> It was a hard lockup, so it's more than just "not fast enough".
> Indeed it was progressing, but not at a rate sufficient to avoid
> lockup.
>
> > > these threads and bouncing the rq lock (and making no progress due to
> > > ALL_PINNED). Before this patch, load balance would quit after hitting
> > > loop_max.
> > >
> > > Even ignoring that specific instance, it seems pretty easy for this
> > > patch to cause a softlockup due to a buggy or malicious process.
> >
> > The fact that the rq is released regularly should prevent a
> > softlockup.
>
> That doesn't prevent a softlockup; kernel is stuck looping over a long
> list of tasks for too long, regardless of whether it is releasing and
> re-acquiring the rq locks.
>
> Note also that load balance can come from softirq in a context where
> we have IRQ disabled, which can lead to hard lockup as well.

fair enough

>
> > And we could even fasten can_migrate() which does a lot of
> > useless stuff for task affined to 1 cpu.
>
> That seems like a useful optimization, but not really relevant? It
> doesn't matter how small we make the constant factor, we still have an
> O(n) operation in kernel mode here.
>
> > > For the tradeoff you were trying to make in this patch (spend more
> > > time searching in the hopes that there's something migratable further
> > > in the list), perhaps it would be better to adjust
> > > sysctl.sched_nr_migrate instead of baking this into the kernel?
> >
> > That could be a solution but this increases the iterations for all
> > cases including those which are more time consuming to sort out and
> > the number of tasks that you will migrate in one lb. The latter is the
> > one which consumes time
>
> Is is really that bad? loop_max will be unchanged for most cases since
> it gets min'd with nr_running anyway. And, even if loop_max ends up
> larger in some other instances, we still terminate the iteration after
> fixing up env->imbalance (granted, we'll migrate more tasks to achieve
> a better balance with a larger loop_max, which I think is your point).

Yes, my point is that load of a task can be quite small, especially
with cgroups, so we can end up detaching/attaching a very large number
of tasks which is far more time consuming that checking if we can
migrate it or not
>
>
> Another idea then: what about separating the number of tasks we can
> move from the number of tasks we can search? You effectively want to
> keep the number of tasks that can be migrated small (nr_migrate), but
> be able to search deeper in the list for things to pull (a larger
> search_depth).

That could be a solution indeed

>
> - Josh