[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39bf6191-82d5-467c-9c09-2deb420875ba@os.amperecomputing.com>
Date: Wed, 12 Nov 2025 20:04:05 +0800
From: Adam Li <adamli@...amperecomputing.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Chris Mason <clm@...a.com>, Joseph Salisbury
<joseph.salisbury@...cle.com>, Hazem Mohamed Abuelfotoh
<abuehaze@...zon.com>, Josh Don <joshdon@...gle.com>, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] sched/fair: Proportional newidle balance
On 11/11/2025 5:20 PM, Peter Zijlstra wrote:
> On Tue, Nov 11, 2025 at 05:07:45PM +0800, Adam Li wrote:
>>> @@ -12843,6 +12858,22 @@ static int sched_balance_newidle(struct
>>> break;
>>>
>>> if (sd->flags & SD_BALANCE_NEWIDLE) {
>>> + unsigned int weight = 1;
>>> +
>>> + if (sched_feat(NI_RANDOM)) {
>>> + /*
>>> + * Throw a 1k sided dice; and only run
>>> + * newidle_balance according to the success
>>> + * rate.
>>> + */
>>> + u32 d1k = sched_rng() % 1024;
>>> + weight = 1 + sd->newidle_ratio;
>>> + if (d1k > weight) {
>>> + update_newidle_stats(sd, 0);
>>> + continue;
>>> + }
>>> + weight = (1024 + weight/2) / weight;
>>> + }
>>>
>> e.g: Why 'weight = (1024 + weight/2) / weight'
>
> Not sure what you're asking, so two answers:
>
> That's a rounding divide. We have a helper for that, but I never can
> remember what its called.
>
> The transformation as a whole here is from a ratio to a weight, suppose
> our ratio is 256, this means that we do 1-in-4 or 25% of the balance
> calls. However this also means that each success needs to be weighted as
> 4 (=1024/256), otherwise we under-account the successes and not even a
> 100% success rate can lift you out the hole.
>
> Now, I made it a rounding divide to make it a little easier to climb out
> of said hole (I even considered ceiling divide).
>
>
Thanks for clarification.
If I understand correctly, (sd->newidle_ratio / 1024) is close to
(sd->newidle_success / sd->newidle_call). 'sd->newidle_ratio' means
success rate of newidle balance.
Shall we update newidle stats only from sched_balance_newidle()
as bellow patch? So that sched_balance_domains() will not update sd->newidle_call.
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12171,7 +12171,8 @@ update_newidle_cost(struct sched_domain *sd, u64 cost, unsigned int success)
unsigned long next_decay = sd->last_decay_max_lb_cost + HZ;
unsigned long now = jiffies;
- update_newidle_stats(sd, success);
+ if (cost)
+ update_newidle_stats(sd, success);
if (cost > sd->max_newidle_lb_cost) {
/*
I tested this change, Specjbb performance is similar with your patch.
Thanks,
-adam
Powered by blists - more mailing lists