[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef7d974e-7fcd-4e30-8a0d-8b97e00478bc@amd.com>
Date: Wed, 12 Nov 2025 12:55:23 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Wanpeng Li <kernellwp@...il.com>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, "Paolo
Bonzini" <pbonzini@...hat.com>, Sean Christopherson <seanjc@...gle.com>
CC: Steven Rostedt <rostedt@...dmis.org>, Vincent Guittot
<vincent.guittot@...aro.org>, Juri Lelli <juri.lelli@...hat.com>,
<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, Wanpeng Li
<wanpengli@...cent.com>
Subject: Re: [PATCH 04/10] sched/fair: Add penalty calculation and application
logic
Hello Wanpeng,
On 11/10/2025 9:02 AM, Wanpeng Li wrote:
> +/*
> + * Calculate penalty with debounce logic for EEVDF yield deboost.
> + * Computes vruntime penalty based on fairness gap (need) plus granularity,
> + * applies queue-size-based caps to prevent excessive penalties in small queues,
> + * and implements reverse-pair debounce (~300us) to reduce ping-pong effects.
> + * Returns 0 if no penalty needed, otherwise returns clamped penalty value.
> + */
> +static u64 __maybe_unused yield_deboost_calculate_penalty(struct rq *rq, struct sched_entity *se_y_lca,
> + struct sched_entity *se_t_lca, struct sched_entity *se_t,
> + int nr_queued)
> +{
> + u64 gran, need, penalty, maxp;
> + u64 gran_floor;
> + u64 weighted_need, base;
> +
> + gran = calc_delta_fair(sysctl_sched_base_slice, se_y_lca);
> + /* Low-bound safeguard for gran when slice is abnormally small */
> + gran_floor = calc_delta_fair(sysctl_sched_base_slice >> 1, se_y_lca);
> + if (gran < gran_floor)
Is this even possible?
> + gran = gran_floor;
> +
> + need = 0;
> + if (se_t_lca->vruntime > se_y_lca->vruntime)
> + need = se_t_lca->vruntime - se_y_lca->vruntime;
So I'm assuming you want the yielding task's vruntime to
cross the target's vruntime simply because one task somewhere
down the hierarchy said so.
> +
> + /* Apply 10% boost to need when positive (weighted_need = need * 1.10) */
> + penalty = gran;
So at the very least I see it getting weighted(base_slice / 2) penalty
...
> + if (need) {
> + /* weighted_need = need + 10% */
> + weighted_need = need + need / 10;
> + /* clamp to avoid overflow when adding to gran (still capped later) */
> + if (weighted_need > U64_MAX - penalty)
> + weighted_need = U64_MAX - penalty;
> + penalty += weighted_need;
... if not more ...
> + }
> +
> + /* Apply debounce via helper to avoid ping-pong */
> + penalty = yield_deboost_apply_debounce(rq, se_t, penalty, need, gran);
... since without debounce, penalty remains same.
> +
> + /* Upper bound (cap): slightly more aggressive for mid-size queues */
> + if (nr_queued == 2)
> + maxp = gran * 6; /* Strongest push for 2-task ping-pong */
> + else if (nr_queued == 3)
> + maxp = gran * 4; /* 4.0 * gran */
> + else if (nr_queued <= 6)
> + maxp = (gran * 5) / 2; /* 2.5 * gran */
> + else if (nr_queued <= 8)
> + maxp = gran * 2; /* 2.0 * gran */
> + else if (nr_queued <= 12)
> + maxp = (gran * 3) / 2; /* 1.5 * gran */
> + else
> + maxp = gran; /* 1.0 * gran */
And all the nr_queued calculations are based on the entities queued
and not the "h_nr_queued" so we can have a boat load of tasks to
run above but since one task decided to call yield_to() let us make
them all starve a little?
> +
> + if (penalty < gran)
> + penalty = gran;
> + if (penalty > maxp)
> + penalty = maxp;
> +
> + /* If no need, apply refined baseline push (low risk + mid risk combined). */
> + if (need == 0) {
> + /*
> + * Baseline multiplier for need==0:
> + * 2 -> 1.00 * gran
> + * 3 -> 0.9375 * gran
> + * 4–6 -> 0.625 * gran
> + * 7–8 -> 0.50 * gran
> + * 9–12 -> 0.375 * gran
> + * >12 -> 0.25 * gran
> + */
> + base = gran;
> + if (nr_queued == 3)
> + base = (gran * 15) / 16; /* 0.9375 */
> + else if (nr_queued >= 4 && nr_queued <= 6)
> + base = (gran * 5) / 8; /* 0.625 */
> + else if (nr_queued >= 7 && nr_queued <= 8)
> + base = gran / 2; /* 0.5 */
> + else if (nr_queued >= 9 && nr_queued <= 12)
> + base = (gran * 3) / 8; /* 0.375 */
> + else if (nr_queued > 12)
> + base = gran / 4; /* 0.25 */
> +
> + if (penalty < base)
> + penalty = base;
> + }
> +
> + return penalty;
> +}
> +
> +/*
> + * Apply penalty and update EEVDF fields for scheduler consistency.
> + * Safely applies vruntime penalty with overflow protection, then updates
> + * EEVDF-specific fields (deadline, vlag) and cfs_rq min_vruntime to maintain
> + * scheduler state consistency. Returns true on successful application,
> + * false if penalty cannot be safely applied.
> + */
> +static void __maybe_unused yield_deboost_apply_penalty(struct rq *rq, struct sched_entity *se_y_lca,
> + struct cfs_rq *cfs_rq_common, u64 penalty)
> +{
> + u64 new_vruntime;
> +
> + /* Overflow protection */
> + if (se_y_lca->vruntime > (U64_MAX - penalty))
> + return;
> +
> + new_vruntime = se_y_lca->vruntime + penalty;
> +
> + /* Validity check */
> + if (new_vruntime <= se_y_lca->vruntime)
> + return;
> +
> + se_y_lca->vruntime = new_vruntime;
> + se_y_lca->deadline = se_y_lca->vruntime + calc_delta_fair(se_y_lca->slice, se_y_lca);
And with that we update vruntime to an arbitrary value simply
because one task in the hierarchy decided to call yield_to().
Since we are on the topic, you are also missing an update_curr()
which is only done in yield_task_fair() so you are actually
looking at old vruntime for the yielding entity.
> + se_y_lca->vlag = avg_vruntime(cfs_rq_common) - se_y_lca->vruntime;
> + update_min_vruntime(cfs_rq_common);
> +}
> +
> /*
> * sched_yield() is very simple
> */
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists