[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250716112533.GS1613200@noisy.programming.kicks-ass.net>
Date: Wed, 16 Jul 2025 13:25:33 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: "Chen, Yu C" <yu.c.chen@...el.com>
Cc: kernel test robot <oliver.sang@...el.com>, Chris Mason <clm@...com>,
oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
aubrey.li@...ux.intel.com, vincent.guittot@...aro.org
Subject: Re: [PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle
balance fails
On Tue, Jul 15, 2025 at 06:08:43PM +0800, Chen, Yu C wrote:
> On 7/15/2025 3:08 PM, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 22.9% regression of unixbench.throughput on:
> >
> >
> > commit: ac34cb39e8aea9915ec2f4e08c979eb2ed1d7561 ("[PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails")
> > url: https://github.com/intel-lab-lkp/linux/commits/Chris-Mason/sched-fair-bump-sd-max_newidle_lb_cost-when-newidle-balance-fails/20250626-224805
> > base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 5bc34be478d09c4d16009e665e020ad0fcd0deea
> > patch link: https://lore.kernel.org/all/20250626144017.1510594-2-clm@fb.com/
> > patch subject: [PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails
> >
> > testcase: unixbench
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> > parameters:
> >
> > runtime: 300s
> > nr_task: 100%
> > test: shell1
> > cpufreq_governor: performance
> >
> >
> ...
>
> >
> > commit:
> > 5bc34be478 ("sched/core: Reorganize cgroup bandwidth control interface file writes")
> > ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails")
> >
> > 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> ...
>
> > 40.37 +16.9 57.24 mpstat.cpu.all.idle%
>
> This commit inhibits the newidle balance.
When not successful. So when newidle balance is not succeeding to pull
tasks, it is backing off and doing less of it.
> It seems that some workloads
> do not like newlyidle balance, like schbench, which is short duration
> task. While other workloads want the newidle balance to pull at its best
> effort, like unixbench shell test case.
> Just wonder if we can check the sched domain's average utilization to
> decide how hard we should trigger the newly idle balance, or can we check
> the overutilized flag to decide whether we should launch the
> new idle balance, something I was thinking of:
Looking at the actual util signal might be interesting, but as Chris
already noted, overutilized isn't the right thing to look at. Simply
taking rq->cfs.avg.util_avg might be more useful. Very high util and
failure to pull might indicate new-idle just isn't very important /
effective. While low util and failure might mean we should try harder.
Other things to look at:
- if the sysctl_sched_migration_cost limit isn't artificially limiting
actual scanning costs. Eg. very large domains might perhaps have
costs that are genuinely larger than that somewhat random number.
- if despite the apparent failure to pull, we do already have something
to run (eg. wakeups).
- if the 3/2 backoff is perhaps too aggressive vs the 1% per second
decay.
Powered by blists - more mailing lists