[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad666d65-6b67-4068-b429-12bd6273954c@arm.com>
Date: Wed, 28 Jan 2026 12:24:19 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com, linux-kernel@...r.kernel.org,
mgorman@...hsingularity.net, vineethr@...ux.ibm.com, clm@...a.com,
Christian.Loehle@....com
Subject: Re: [PATCH] sched/fair: revert force wakeup preemption
On 23/01/2026 10:28, Vincent Guittot wrote:
> This agressively bypasses run_to_parity and slice protection with the
> assumpiton that this is what waker wants but there is no garantee that
> the wakee will be the next to run. It is a better choice to use
> yield_to_task or WF_SYNC in such case.
>
> This increases the number of resched and preemption because a task becomes
> quickly "ineligible" when it runs; We update the task vruntime periodically
> and before the task exhausted its slice or at least quantum.
>
> Example:
> 2 tasks A and B wake up simultaneously with lag = 0. Both are
> eligible. Task A runs 1st and wakes up task C. Scheduler updates task
> A's vruntime which becomes greater than average runtime as all others
> have a lag == 0 and didn't run yet. Now task A is ineligible because
> it received more runtime than the other task but it has not yet
> exhausted its slice nor a min quantum. We force preemption, disable
> protection but Task B will run 1st not task C.
>
> Sidenote, DELAY_ZERO increases this effect by clearing positive lag at
> wake up.
>
> Fixes: e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals")
> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
I see that this is already merged for -rc7 (which is great - thanks for the fast
turnaround!). Here are the performance results I promised.
TL;DR: This patch combined with the NEXT_BUDDY disablement patch fixes all the
regressions I originally reported.
6-18-0 (base) (baseline)
6-19-0-rc6 (New NEXT_BUDDY implementation enabled)
6-19-0-rc6+p1 (New NEXT_BUDDY implementation disabled)
6-19-0-rc6+p1+p2 (+ this patch)
Multi-node SUT (workload running across 2 machines):
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
| Benchmark | Result Class | 6-18-0 (base) | 6-19-0-rc6 | 6-19-0-rc6+p1 | 6-19-0-rc6+p1+p2 |
+=================================+====================================================+===============+=============+===============+==================+
| repro-collection/mysql-workload | db transaction rate (transactions/min) | 646267.33 | (R) -0.89% | (I) 4.01% | (I) 6.03% |
| | new order rate (orders/min) | 213256.50 | (R) -0.89% | (I) 3.94% | (I) 6.05% |
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
Single-node SUT (workload running on single machine):
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
| Benchmark | Result Class | 6-18-0 (base) | 6-19-0-rc6 | 6-19-0-rc6+p1 | 6-19-0-rc6+p1+p2 |
+=================================+====================================================+===============+=============+===============+==================+
| specjbb/composite | critical-jOPS (jOPS) | 94700.00 | (R) -4.12% | (I) 3.07% | (I) 1.27% |
| | max-jOPS (jOPS) | 113984.50 | (R) -2.80% | (I) 1.94% | (I) 1.94% |
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
| repro-collection/mysql-workload | db transaction rate (transactions/min) | 245438.25 | (R) -3.07% | -1.34% | 0.23% |
| | new order rate (orders/min) | 80985.75 | (R) -3.06% | -1.29% | 0.25% |
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
| pts/pgbench | Scale: 1 Clients: 1 Read Only (TPS) | 63124.00 | (I) 2.67% | 2.58% | (I) 2.69% |
| | Scale: 1 Clients: 1 Read Only - Latency (ms) | 0.016 | 4.35% | 4.35% | 4.35% |
| | Scale: 1 Clients: 1 Read Write (TPS) | 974.92 | 0.03% | 0.11% | -0.06% |
| | Scale: 1 Clients: 1 Read Write - Latency (ms) | 1.03 | 0.01% | 0.14% | -0.04% |
| | Scale: 1 Clients: 250 Read Only (TPS) | 1915931.58 | (R) -3.28% | (R) -3.92% | 1.23% |
| | Scale: 1 Clients: 250 Read Only - Latency (ms) | 0.13 | (R) -3.33% | (R) -3.93% | 1.16% |
| | Scale: 1 Clients: 250 Read Write (TPS) | 855.67 | 0.27% | -0.49% | -1.44% |
| | Scale: 1 Clients: 250 Read Write - Latency (ms) | 292.39 | 0.32% | -0.49% | -1.40% |
| | Scale: 1 Clients: 1000 Read Only (TPS) | 1534130.08 | (R) -12.20% | (R) -11.85% | 0.45% |
| | Scale: 1 Clients: 1000 Read Only - Latency (ms) | 0.65 | (R) -12.19% | (R) -11.87% | 0.46% |
| | Scale: 1 Clients: 1000 Read Write (TPS) | 578.75 | 0.85% | 1.60% | -5.23% |
| | Scale: 1 Clients: 1000 Read Write - Latency (ms) | 1736.98 | 1.12% | 1.52% | -4.91% |
| | Scale: 100 Clients: 1 Read Only (TPS) | 57170.33 | 1.64% | 2.16% | 1.69% |
| | Scale: 100 Clients: 1 Read Only - Latency (ms) | 0.018 | 1.94% | 1.94% | 2.94% |
| | Scale: 100 Clients: 1 Read Write (TPS) | 836.58 | 0.27% | 0.07% | 0.13% |
| | Scale: 100 Clients: 1 Read Write - Latency (ms) | 1.20 | 0.27% | 0.06% | 0.15% |
| | Scale: 100 Clients: 250 Read Only (TPS) | 1773440.67 | (R) -2.54% | (R) -2.94% | 1.00% |
| | Scale: 100 Clients: 250 Read Only - Latency (ms) | 0.14 | (R) -2.42% | (R) -2.87% | 1.08% |
| | Scale: 100 Clients: 250 Read Write (TPS) | 5505.50 | -1.51% | 0.17% | -0.03% |
| | Scale: 100 Clients: 250 Read Write - Latency (ms) | 45.42 | -1.52% | 0.17% | -0.03% |
| | Scale: 100 Clients: 1000 Read Only (TPS) | 1393037.50 | (R) -10.08% | (R) -10.36% | 0.60% |
| | Scale: 100 Clients: 1000 Read Only - Latency (ms) | 0.72 | (R) -10.07% | (R) -10.35% | 0.60% |
| | Scale: 100 Clients: 1000 Read Write (TPS) | 5085.92 | 0.70% | -2.32% | -0.28% |
| | Scale: 100 Clients: 1000 Read Write - Latency (ms) | 196.79 | 0.72% | -2.27% | -0.29% |
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
| mmtests/hackbench | hackbench-process-pipes-1 (seconds) | 0.14 | -1.28% | 0.35% | -1.85% |
| | hackbench-process-pipes-4 (seconds) | 0.44 | (I) 8.20% | (I) 5.72% | (I) 7.23% |
| | hackbench-process-pipes-7 (seconds) | 0.68 | (R) -18.31% | (R) -24.54% | 1.56% |
| | hackbench-process-pipes-12 (seconds) | 1.24 | (R) -19.52% | (R) -24.55% | -0.25% |
| | hackbench-process-pipes-21 (seconds) | 1.81 | (R) -7.33% | (R) -13.58% | -1.14% |
| | hackbench-process-pipes-30 (seconds) | 2.39 | (R) -7.86% | (R) -13.21% | -0.23% |
| | hackbench-process-pipes-48 (seconds) | 3.18 | (R) -10.72% | (R) -12.63% | 1.22% |
| | hackbench-process-pipes-79 (seconds) | 3.84 | (R) -9.52% | (R) -10.31% | -0.07% |
| | hackbench-process-pipes-110 (seconds) | 4.68 | (R) -6.78% | (R) -7.15% | 1.30% |
| | hackbench-process-pipes-141 (seconds) | 5.75 | (R) -5.50% | (R) -5.60% | 1.11% |
| | hackbench-process-pipes-172 (seconds) | 6.80 | (R) -4.67% | (R) -4.79% | 1.61% |
| | hackbench-process-pipes-203 (seconds) | 7.94 | (R) -4.01% | (R) -3.74% | (I) 2.08% |
| | hackbench-process-pipes-234 (seconds) | 9.02 | (R) -3.69% | (R) -3.63% | 1.67% |
| | hackbench-process-pipes-256 (seconds) | 9.78 | (R) -3.80% | (R) -3.19% | 1.65% |
| | hackbench-process-sockets-1 (seconds) | 0.29 | -0.38% | -0.43% | 0.03% |
| | hackbench-process-sockets-4 (seconds) | 0.76 | (I) 17.71% | (I) 18.69% | (I) 19.52% |
| | hackbench-process-sockets-7 (seconds) | 1.16 | (I) 12.10% | (I) 11.37% | (I) 13.52% |
| | hackbench-process-sockets-12 (seconds) | 1.86 | (I) 10.19% | (I) 9.31% | (I) 12.83% |
| | hackbench-process-sockets-21 (seconds) | 3.12 | (I) 9.59% | (I) 8.99% | (I) 12.15% |
| | hackbench-process-sockets-30 (seconds) | 4.30 | (I) 6.23% | (I) 6.75% | (I) 8.88% |
| | hackbench-process-sockets-48 (seconds) | 6.58 | (I) 2.39% | (I) 2.98% | (I) 4.39% |
| | hackbench-process-sockets-79 (seconds) | 10.56 | (I) 3.44% | (I) 3.10% | (I) 3.94% |
| | hackbench-process-sockets-110 (seconds) | 13.85 | -0.77% | 0.44% | (I) 2.50% |
| | hackbench-process-sockets-141 (seconds) | 19.23 | -0.47% | 1.54% | 2.95% |
| | hackbench-process-sockets-172 (seconds) | 26.33 | (I) 3.44% | (I) 4.25% | (I) 3.21% |
| | hackbench-process-sockets-203 (seconds) | 30.27 | 0.36% | 1.67% | 0.90% |
| | hackbench-process-sockets-234 (seconds) | 35.12 | 2.05% | (I) 3.11% | (I) 2.45% |
| | hackbench-process-sockets-256 (seconds) | 38.74 | -0.39% | 1.48% | 2.13% |
| | hackbench-thread-pipes-1 (seconds) | 0.17 | -0.38% | -0.76% | -1.51% |
| | hackbench-thread-pipes-4 (seconds) | 0.45 | (I) 7.85% | (I) 6.15% | (I) 9.93% |
| | hackbench-thread-pipes-7 (seconds) | 0.74 | (R) -7.22% | (R) -9.98% | (I) 6.47% |
| | hackbench-thread-pipes-12 (seconds) | 1.32 | (R) -7.62% | (R) -14.42% | 1.27% |
| | hackbench-thread-pipes-21 (seconds) | 1.95 | (R) -3.00% | (R) -7.93% | -1.67% |
| | hackbench-thread-pipes-30 (seconds) | 2.50 | (R) -4.79% | (R) -11.99% | -1.72% |
| | hackbench-thread-pipes-48 (seconds) | 3.32 | (R) -5.49% | (R) -11.45% | 1.15% |
| | hackbench-thread-pipes-79 (seconds) | 4.04 | (R) -6.16% | (R) -8.88% | -0.56% |
| | hackbench-thread-pipes-110 (seconds) | 4.94 | (R) -2.62% | (R) -4.92% | 0.63% |
| | hackbench-thread-pipes-141 (seconds) | 6.04 | (R) -2.05% | (R) -3.56% | 0.51% |
| | hackbench-thread-pipes-172 (seconds) | 7.15 | -0.74% | -1.93% | 0.91% |
| | hackbench-thread-pipes-203 (seconds) | 8.31 | -1.20% | -1.41% | 0.91% |
| | hackbench-thread-pipes-234 (seconds) | 9.49 | -0.65% | -1.21% | 0.92% |
| | hackbench-thread-pipes-256 (seconds) | 10.30 | -0.56% | -0.92% | 0.88% |
| | hackbench-thread-sockets-1 (seconds) | 0.31 | 0.16% | -0.05% | -0.48% |
| | hackbench-thread-sockets-4 (seconds) | 0.79 | (I) 18.70% | (I) 19.30% | (I) 19.79% |
| | hackbench-thread-sockets-7 (seconds) | 1.16 | (I) 12.35% | (I) 11.90% | (I) 12.91% |
| | hackbench-thread-sockets-12 (seconds) | 1.87 | (I) 12.75% | (I) 11.66% | (I) 14.43% |
| | hackbench-thread-sockets-21 (seconds) | 3.16 | (I) 11.55% | (I) 11.06% | (I) 14.41% |
| | hackbench-thread-sockets-30 (seconds) | 4.32 | (I) 7.66% | (I) 6.58% | (I) 10.15% |
| | hackbench-thread-sockets-48 (seconds) | 6.45 | (I) 2.62% | 1.92% | (I) 4.10% |
| | hackbench-thread-sockets-79 (seconds) | 10.15 | 1.85% | -0.20% | 1.54% |
| | hackbench-thread-sockets-110 (seconds) | 13.45 | -0.29% | -0.41% | 0.08% |
| | hackbench-thread-sockets-141 (seconds) | 17.87 | -1.84% | -1.01% | 1.33% |
| | hackbench-thread-sockets-172 (seconds) | 24.38 | 0.82% | 1.33% | 3.68% |
| | hackbench-thread-sockets-203 (seconds) | 28.38 | -1.29% | 0.72% | 1.58% |
| | hackbench-thread-sockets-234 (seconds) | 32.75 | -1.01% | 1.00% | 0.94% |
| | hackbench-thread-sockets-256 (seconds) | 36.49 | -0.99% | 1.22% | 1.00% |
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
Thanks,
Ryan
> ---
> kernel/sched/fair.c | 10 ----------
> 1 file changed, 10 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04993c763a06..16ecc3475fe2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8822,16 +8822,6 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> if ((wake_flags & WF_FORK) || pse->sched_delayed)
> return;
>
> - /*
> - * If @p potentially is completing work required by current then
> - * consider preemption.
> - *
> - * Reschedule if waker is no longer eligible. */
> - if (in_task() && !entity_eligible(cfs_rq, se)) {
> - preempt_action = PREEMPT_WAKEUP_RESCHED;
> - goto preempt;
> - }
> -
> /* Prefer picking wakee soon if appropriate. */
> if (sched_feat(NEXT_BUDDY) &&
> set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
Powered by blists - more mailing lists