[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2eefea7c-2962-4e50-a8cd-3cb101bedd56@amazon.com>
Date: Wed, 28 Jan 2026 16:24:37 +0000
From: "Mohamed Abuelfotoh, Hazem" <abuehaze@...zon.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Mario Roy <marioeroy@...il.com>, Chris Mason <clm@...a.com>, "Joseph
Salisbury" <joseph.salisbury@...cle.com>, Adam Li
<adamli@...amperecomputing.com>, Josh Don <joshdon@...gle.com>,
<mingo@...hat.com>, <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
<dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
<mgorman@...e.de>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 4/4] sched/fair: Proportional newidle balance
On 27/01/2026 09:13, Peter Zijlstra wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Tue, Jan 27, 2026 at 09:50:25AM +0100, Peter Zijlstra wrote:
>> On Sun, Jan 25, 2026 at 12:22:21PM +0000, Mohamed Abuelfotoh, Hazem wrote:
>>
>>> I can confirm that we are seeing a 4-11% performance regression in v6.12.66
>>> on multiple benchmarks running on c7a.4xlarge AWS EC2 instances that are
>>> powered by AMD EPYC 9R14-series CPU (code-named Genoa) and c7i.4xlarge which
>>> is powered by 4th-Generation Intel Xeon Scalable processor (code-named
>>> Sapphire Rapids). The regression is caused by the commit 33cf66d88306
>>> ("sched/fair: Proportional newidle balance"). We were able to reclaim the
>>> performance back after reverting this commit. We also noticed that the
>>> impact is higher on AMD vs Intel.
>>>
>>> Benchmark Name | Description | Unit
>>> postgresql | HammerDB workload (TPC-C-like benchmark) | NOPM
>>> nginx_lb | Testing NGINX as a load balancer | RPS
>>> memcached | Testing using Lancet load generator | QPS
>>>
>>> **Results on v6.12.66**
>>>
>>> Benchmark name | SUT EC2 Instance | Regression percentage
>>> postgresql | c7a.4xlarge | -4.0%
>>> postgresql | c7i.4xlarge | -4.0%
>>> nginx_lb | c7a.4xlarge | -5.0%
>>> memcached | c7a.4xlarge | -11.0%
>>
>> So only postgres has a regression on Intel? Memcached doesn't show
>> anything?
>
> And just to be sure, v6.12.43-v6.12.65 have no problem?
>
> That is, afaict those are the kernels that have:
>
> fc4289233e4b sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails
>
> But not yet have:
>
> 1b9c118fe318 sched/fair: Proportional newidle balance
> c6ae271bc5fd sched/fair: Small cleanup to update_newidle_cost()
> 52aa889c6f57 sched/fair: Small cleanup to sched_balance_newidle()
> 81343616e712 sched/fair: Revert max_newidle_lb_cost bump
>
> Because fc4289233e4b was also causing a ton of regressions (but also
> improving some workloads). 81343616e712 then reverts this and
> 1b9c118fe318 is supposed to be a compromise between these two.
>
> So if your workloads are not affected by fc4289233e4b and 81343616e712,
> but somehow 1b9c118fe318 is causing fail, then I'm a little puzzled.
>
We have definitely seen significant performance regression specifically
on DB workloads because of fc4289233e4b ("sched/fair: Bump
sd->max_newidle_lb_cost when newidle balance fails") which we reported
in [1]. We were able to reclaim the performance back with ("81343616e712
sched/fair: Revert max_newidle_lb_cost bump") before we start seeing
negative impact from 1b9c118fe318 sched/fair: Proportional newidle balance.
[1]
https://lore.kernel.org/all/006c9df2-b691-47f1-82e6-e233c3f91faf@oracle.com/T/#mb96105e4a320659b5aa68ec112bbeafaae37e769
Powered by blists - more mailing lists