linux-kernel - Re: [PATCH] sched/fair: properly serialize the cfs_rq h

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAPjX3FdTe=5sA8M6GjjNYSGRgJY42z_n+AnJC7ZSBwY=XLTFJw@mail.gmail.com>
Date: Mon, 25 Nov 2024 12:40:44 +0100
From: Daniel Vacek <neelx@...e.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: properly serialize the cfs_rq h_load calculation

On Mon, Nov 25, 2024 at 11:01 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Fri, Nov 22, 2024 at 06:33:31PM +0100, Daniel Vacek wrote:
> > On Fri, Nov 22, 2024 at 4:42 PM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Fri, Nov 22, 2024 at 04:28:55PM +0100, Daniel Vacek wrote:
> > > > Make sure the given cfs_rq's h_load is always correctly updated. This
> > > > prevents a race between more CPUs eventually updating the same hierarchy
> > > > of h_load_next return pointers.
> > >
> > > Is there an actual problem observed?
> >
> > Well, that depends. Do we care about correct (exact) load calculation
> > every time?
>
> The whole load balancer is full of races. And typically it all more or
> less works out.
>
> I mean, the worst case is typically a spurious migration, which will get
> 'fixed' up the next round.
>
> Only if behaviour gets to be really bad/stupid do we tend to try and fix
> this.
>
> Now your patch didn't look awful :-), but it would make a stronger case
> if you'd done it because you observed it doing stupid and it now no
> longer does stupid and your workload improves.

Well, the original motivation were crashes reported on s390 before
commit 0e9f02450da0.
That commit addresses these crashes but not the failed load
calculation. This patch addresses both issues. As a result it makes
the scheduler more correct and deterministic and less racy. The
question is if we strictly need that or if we are happy with it for
the price? And the price looks quite fair to me as the lock could be
acquired only once per jiffy.

Note that the load calculation fails more often than the observed
crashes. Crash is just a special case of a failure depending on the
actual cgroup hierarchy and relation of where are the tasks racing
being woken within that hierarchy.