linux-kernel - Re: [PATCH] sched/fair: properly serialize the cfs_rq h

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241125100108.GH24774@noisy.programming.kicks-ass.net>
Date: Mon, 25 Nov 2024 11:01:08 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Daniel Vacek <neelx@...e.com>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: properly serialize the cfs_rq h_load
 calculation

On Fri, Nov 22, 2024 at 06:33:31PM +0100, Daniel Vacek wrote:
> On Fri, Nov 22, 2024 at 4:42 PM Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > On Fri, Nov 22, 2024 at 04:28:55PM +0100, Daniel Vacek wrote:
> > > Make sure the given cfs_rq's h_load is always correctly updated. This
> > > prevents a race between more CPUs eventually updating the same hierarchy
> > > of h_load_next return pointers.
> >
> > Is there an actual problem observed?
> 
> Well, that depends. Do we care about correct (exact) load calculation
> every time?

The whole load balancer is full of races. And typically it all more or
less works out.

I mean, the worst case is typically a spurious migration, which will get
'fixed' up the next round.

Only if behaviour gets to be really bad/stupid do we tend to try and fix
this.

Now your patch didn't look awful :-), but it would make a stronger case
if you'd done it because you observed it doing stupid and it now no
longer does stupid and your workload improves.