linux-kernel - Re: [PATCH 4/4] sched/fair: Proportional newidle balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20260127091343.GC217302@noisy.programming.kicks-ass.net>
Date: Tue, 27 Jan 2026 10:13:43 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: "Mohamed Abuelfotoh, Hazem" <abuehaze@...zon.com>
Cc: Mario Roy <marioeroy@...il.com>, Chris Mason <clm@...a.com>,
	Joseph Salisbury <joseph.salisbury@...cle.com>,
	Adam Li <adamli@...amperecomputing.com>,
	Josh Don <joshdon@...gle.com>, mingo@...hat.com,
	juri.lelli@...hat.com, vincent.guittot@...aro.org,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] sched/fair: Proportional newidle balance

On Tue, Jan 27, 2026 at 09:50:25AM +0100, Peter Zijlstra wrote:
> On Sun, Jan 25, 2026 at 12:22:21PM +0000, Mohamed Abuelfotoh, Hazem wrote:
> 
> > I can confirm that we are seeing a 4-11% performance regression in v6.12.66
> > on multiple benchmarks running on c7a.4xlarge AWS EC2 instances that are
> > powered by AMD EPYC 9R14-series CPU (code-named Genoa) and c7i.4xlarge which
> > is powered by 4th-Generation Intel Xeon Scalable processor (code-named
> > Sapphire Rapids). The regression is caused by the commit 33cf66d88306
> > ("sched/fair: Proportional newidle balance"). We were able to reclaim the
> > performance back after reverting this commit. We also noticed that the
> > impact is higher on AMD vs Intel.
> > 
> > Benchmark Name |  Description				    | Unit
> > postgresql     |  HammerDB workload (TPC-C-like benchmark)  | NOPM
> > nginx_lb       |  Testing NGINX as a load balancer	    | RPS
> > memcached      |  Testing using Lancet load generator       | QPS
> > 
> > **Results on v6.12.66**
> > 
> > Benchmark name | SUT EC2 Instance | Regression percentage
> > postgresql     | c7a.4xlarge      | -4.0%
> > postgresql     | c7i.4xlarge      | -4.0%
> > nginx_lb       | c7a.4xlarge      | -5.0%
> > memcached      | c7a.4xlarge      | -11.0%
> 
> So only postgres has a regression on Intel? Memcached doesn't show
> anything?

And just to be sure, v6.12.43-v6.12.65 have no problem?

That is, afaict those are the kernels that have:

  fc4289233e4b sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails

But not yet have:

  1b9c118fe318 sched/fair: Proportional newidle balance
  c6ae271bc5fd sched/fair: Small cleanup to update_newidle_cost()
  52aa889c6f57 sched/fair: Small cleanup to sched_balance_newidle()
  81343616e712 sched/fair: Revert max_newidle_lb_cost bump

Because fc4289233e4b was also causing a ton of regressions (but also
improving some workloads). 81343616e712 then reverts this and
1b9c118fe318 is supposed to be a compromise between these two.

So if your workloads are not affected by fc4289233e4b and 81343616e712,
but somehow 1b9c118fe318 is causing fail, then I'm a little puzzled.