[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260127091343.GC217302@noisy.programming.kicks-ass.net>
Date: Tue, 27 Jan 2026 10:13:43 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: "Mohamed Abuelfotoh, Hazem" <abuehaze@...zon.com>
Cc: Mario Roy <marioeroy@...il.com>, Chris Mason <clm@...a.com>,
Joseph Salisbury <joseph.salisbury@...cle.com>,
Adam Li <adamli@...amperecomputing.com>,
Josh Don <joshdon@...gle.com>, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] sched/fair: Proportional newidle balance
On Tue, Jan 27, 2026 at 09:50:25AM +0100, Peter Zijlstra wrote:
> On Sun, Jan 25, 2026 at 12:22:21PM +0000, Mohamed Abuelfotoh, Hazem wrote:
>
> > I can confirm that we are seeing a 4-11% performance regression in v6.12.66
> > on multiple benchmarks running on c7a.4xlarge AWS EC2 instances that are
> > powered by AMD EPYC 9R14-series CPU (code-named Genoa) and c7i.4xlarge which
> > is powered by 4th-Generation Intel Xeon Scalable processor (code-named
> > Sapphire Rapids). The regression is caused by the commit 33cf66d88306
> > ("sched/fair: Proportional newidle balance"). We were able to reclaim the
> > performance back after reverting this commit. We also noticed that the
> > impact is higher on AMD vs Intel.
> >
> > Benchmark Name | Description | Unit
> > postgresql | HammerDB workload (TPC-C-like benchmark) | NOPM
> > nginx_lb | Testing NGINX as a load balancer | RPS
> > memcached | Testing using Lancet load generator | QPS
> >
> > **Results on v6.12.66**
> >
> > Benchmark name | SUT EC2 Instance | Regression percentage
> > postgresql | c7a.4xlarge | -4.0%
> > postgresql | c7i.4xlarge | -4.0%
> > nginx_lb | c7a.4xlarge | -5.0%
> > memcached | c7a.4xlarge | -11.0%
>
> So only postgres has a regression on Intel? Memcached doesn't show
> anything?
And just to be sure, v6.12.43-v6.12.65 have no problem?
That is, afaict those are the kernels that have:
fc4289233e4b sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails
But not yet have:
1b9c118fe318 sched/fair: Proportional newidle balance
c6ae271bc5fd sched/fair: Small cleanup to update_newidle_cost()
52aa889c6f57 sched/fair: Small cleanup to sched_balance_newidle()
81343616e712 sched/fair: Revert max_newidle_lb_cost bump
Because fc4289233e4b was also causing a ton of regressions (but also
improving some workloads). 81343616e712 then reverts this and
1b9c118fe318 is supposed to be a compromise between these two.
So if your workloads are not affected by fc4289233e4b and 81343616e712,
but somehow 1b9c118fe318 is causing fail, then I'm a little puzzled.
Powered by blists - more mailing lists