[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c8c89226-f9a0-4cc8-bf7a-fa65a1fe790a@meta.com>
Date: Tue, 15 Jul 2025 11:38:05 -0400
From: Chris Mason <clm@...a.com>
To: "Chen, Yu C" <yu.c.chen@...el.com>,
kernel test robot <oliver.sang@...el.com>, Chris Mason <clm@...com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
aubrey.li@...ux.intel.com, peterz@...radead.org,
vincent.guittot@...aro.org
Subject: Re: [PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle
balance fails
On 7/15/25 6:08 AM, Chen, Yu C wrote:
> On 7/15/2025 3:08 PM, kernel test robot wrote:
>>
>>
>> Hello,
>>
>> kernel test robot noticed a 22.9% regression of unixbench.throughput on:
>>
>>
>> commit: ac34cb39e8aea9915ec2f4e08c979eb2ed1d7561 ("[PATCH v2] sched/
>> fair: bump sd->max_newidle_lb_cost when newidle balance fails")
>> url: https://github.com/intel-lab-lkp/linux/commits/Chris-Mason/sched-
>> fair-bump-sd-max_newidle_lb_cost-when-newidle-balance-
>> fails/20250626-224805 base: https://git.kernel.org/cgit/linux/kernel/
>> git/tip/tip.git 5bc34be478d09c4d16009e665e020ad0fcd0deea
>> patch link: https://lore.kernel.org/all/20250626144017.1510594-2-
>> clm@...com/ patch subject: [PATCH v2] sched/fair: bump sd-
>> >max_newidle_lb_cost when newidle balance fails
[ ... ]
>>
>> commit:
>> 5bc34be478 ("sched/core: Reorganize cgroup bandwidth control
>> interface file writes")
>> ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle
>> balance fails")
>>
>> 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9
>> ---------------- ---------------------------
>> %stddev %change %stddev
>> \ | \
> ...
>
>> 40.37 +16.9 57.24 mpstat.cpu.all.idle%
>
> This commit inhibits the newidle balance. It seems that some workloads
> do not like newlyidle balance, like schbench, which is short duration
> task. While other workloads want the newidle balance to pull at its best
> effort, like unixbench shell test case.
> Just wonder if we can check the sched domain's average utilization to
> decide how hard we should trigger the newly idle balance, or can we check
> the overutilized flag to decide whether we should launch the
> new idle balance, something I was thinking of:
Thanks for looking at this.
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9e24038fa000..6c7420ed484e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -13759,7 +13759,8 @@ static int sched_balance_newidle(struct rq
> *this_rq, struct rq_flags *rf)
> sd = rcu_dereference_check_sched_domain(this_rq->sd);
>
> if (!get_rd_overloaded(this_rq->rd) ||
> - (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) {
> + (sd && this_rq->avg_idle < sd->max_newidle_lb_cost &&
> + !READ_ONCE(this_rq->rd->overutilized))) {
>
> if (sd)
> update_next_balance(sd, &next_balance);
>
Looking at rd->overutilized, I think we only set it when
sched_energy_enabled(). I'm not sure if that's true often enough to use
as a fix for hackbench?
-chris
Powered by blists - more mailing lists