[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCMcgMO1mK4sNwHtqbKWTQRB_92yPE2vd+11k7aHAukew@mail.gmail.com>
Date: Thu, 22 Jan 2026 18:34:28 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Mel Gorman <mgorman@...hsingularity.net>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Valentin Schneider <vschneid@...hat.com>, Chris Mason <clm@...a.com>, linux-kernel@...r.kernel.org,
Christian.Loehle@....com
Subject: Re: [PATCH] sched/fair: Disable scheduler feature NEXT_BUDDY
Hi Ryan,
Thanks for adding me in the loop
On Thu, 22 Jan 2026 at 14:38, Ryan Roberts <ryan.roberts@....com> wrote:
>
> Hi Mel,
>
>
> On 20/01/2026 11:33, Mel Gorman wrote:
> > NEXT_BUDDY was disabled with the introduction of EEVDF and enabled again
> > after NEXT_BUDDY was rewritten for EEVDF by commit e837456fdca8 ("sched/fair:
> > Reimplement NEXT_BUDDY to align with EEVDF goals"). It was not expected
> > that this would be a universal win without a crystal ball instruction
> > but the reported regressions are a concern [1][2] even if gains were
> > also reported. Specifically;
> >
> > o mysql with client/server running on different servers regresses
> > o specjbb reports lower peak metrics
> > o daytrader regresses
> >
> > The mysql is realistic and a concern. It needs to be confirmed if
> > specjbb is simply shifting the point where peak performance is measured
> > but still a concern. daytrader is considered to be representative of a
> > real workload.
> >
> > Access to test machines is currently problematic for verifying any fix to
> > this problem. Disable NEXT_BUDDY for now by default until the root causes
> > are addressed.
The new NEXT_BUDDY implementation is doing more than setting a buddy;
it also breaks the run to parity mechanism by always setting next
buddy during wakeup_preempt_fair() even if there is no relation
between the 2 tasks and PICK_BUDDY bypasses protections
In addition to disable NEXT_BUDDY, i suggest to also revert the force
preemption section below which also breaks run_to_parity by doing an
assumption whereas WF_SYNC is normally there for such purpose
-- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8822,16 +8822,6 @@ static void wakeup_preempt_fair(struct rq *rq,
struct task_struct *p, int wake_f
if ((wake_flags & WF_FORK) || pse->sched_delayed)
return;
- /*
- * If @p potentially is completing work required by current then
- * consider preemption.
- *
- * Reschedule if waker is no longer eligible. */
- if (in_task() && !entity_eligible(cfs_rq, se)) {
- preempt_action = PREEMPT_WAKEUP_RESCHED;
- goto preempt;
- }
-
/* Prefer picking wakee soon if appropriate. */
if (sched_feat(NEXT_BUDDY) &&
set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
This largely increases the number of resched and preemption because a
task becomes quickly "ineligible": We update our internal vruntime
periodically and before the task exhausted its slice.
Example:
2 tasks A and B wake up simultaneously with lag = 0. Both are
eligible. Task A runs 1st and wakes task C up. Scheduler updates task
A's vruntime which becomes greater than average runtime as all others
have a lag == 0 and didn't run yet. Now task A is ineligible because
it received more runtime than the other task but it has not yet
exhausted its slice nor a min quantum. We force preemption, disable
protection but Task B will run 1st not task C.
Sidenote, DELAY_ZERO increases this effect by clearing positive lag
> >
> > Link: https://lore.kernel.org/lkml/4b96909a-f1ac-49eb-b814-97b8adda6229@arm.com [1]
> > Link: https://lore.kernel.org/lkml/ec3ea66f-3a0d-4b5a-ab36-ce778f159b5b@linux.ibm.com [2]
> > Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
> > ---
> > kernel/sched/features.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> > index 980d92bab8ab..136a6584be79 100644
> > --- a/kernel/sched/features.h
> > +++ b/kernel/sched/features.h
> > @@ -29,7 +29,7 @@ SCHED_FEAT(PREEMPT_SHORT, true)
> > * wakeup-preemption), since its likely going to consume data we
> > * touched, increases cache locality.
> > */
> > -SCHED_FEAT(NEXT_BUDDY, true)
> > +SCHED_FEAT(NEXT_BUDDY, false)
> >
> > /*
> > * Allow completely ignoring cfs_rq->next; which can be set from various
>
>
> We have rerun the same set of benchmarks for v6.19-rc6 + this patch. I've added
> the results as an extra column. Numbers all relative to v6.18. Other columns as
> per [1].
>
> [1] https://lore.kernel.org/all/63d22eb9-b309-4d11-aa56-3f1e7e12edb1@arm.com/
>
> 6-18-0 (base) (baseline)
> 6-19-0-rc1 (New NEXT_BUDDY implementation enabled)
> revert #1 & #2 (NEXT_BUDDY disabled)
> revert #2 (Old NEXT_BUDDY implementation enabled)
> 6-19-0-rc6+patch (New NEXT_BUDDY implementation disabled)
>
> It's definitely better than v6.19-rc1. But it's not as good as "revert #1 & #2".
>
> So I guess this implies that disabling the new version of NEXT_BUDDY is not
> exactly the same as reverting your original patches #1 and #2 - i.e. old version
> of NEXT_BUDDY disabled isn't exactly the same as new version of NEXT_BUDDY
> disabled?
>
> Thanks,
> Ryan
>
>
> Multi-node SUT (workload running across 2 machines):
>
> +---------------------------------+----------------------------------------------------+---------------+-------------+-----------------------------+------------------+
> | Benchmark | Result Class | 6-18-0 (base) | 6-19-0-rc1 | revert #2 | revert #1 & #2 | 6-19-0-rc6+patch |
> +=================================+====================================================+===============+=============+============+================+==================+
> | repro-collection/mysql-workload | db transaction rate (transactions/min) | 646267.33 | (R) -1.33% | (I) 5.87% | (I) 7.63% | (I) 4.01% |
> | | new order rate (orders/min) | 213256.50 | (R) -1.32% | (I) 5.87% | (I) 7.64% | (I) 3.94% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+------------+----------------+------------------+
>
> Single-node SUT (workload running on single machine):
>
> +---------------------------------+----------------------------------------------------+---------------+-------------+-----------------------------+------------------+
> | Benchmark | Result Class | 6-18-0 (base) | 6-19-0-rc1 | revert #2 | revert #1 & #2 | 6-19-0-rc6+patch |
> +=================================+====================================================+===============+=============+============+================+==================+
> | specjbb/composite | critical-jOPS (jOPS) | 94700.00 | (R) -5.10% | -0.90% | -0.37% | (I) 3.07% |
> | | max-jOPS (jOPS) | 113984.50 | (R) -3.90% | -0.65% | 0.65% | (I) 1.94% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+------------+----------------+------------------+
> | repro-collection/mysql-workload | db transaction rate (transactions/min) | 245438.25 | (R) -3.88% | -0.13% | 0.24% | -1.34% |
> | | new order rate (orders/min) | 80985.75 | (R) -3.78% | -0.07% | 0.29% | -1.29% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+------------+----------------+------------------+
> | pts/pgbench | Scale: 1 Clients: 1 Read Only (TPS) | 63124.00 | (I) 2.90% | 0.74% | 0.85% | 2.58% |
> | | Scale: 1 Clients: 1 Read Only - Latency (ms) | 0.016 | (I) 5.49% | 1.05% | 1.05% | 4.35% |
> | | Scale: 1 Clients: 1 Read Write (TPS) | 974.92 | 0.11% | -0.08% | -0.03% | 0.11% |
> | | Scale: 1 Clients: 1 Read Write - Latency (ms) | 1.03 | 0.12% | -0.06% | -0.06% | 0.14% |
> | | Scale: 1 Clients: 250 Read Only (TPS) | 1915931.58 | (R) -2.25% | (I) 2.12% | 1.62% | (R) -3.92% |
> | | Scale: 1 Clients: 250 Read Only - Latency (ms) | 0.13 | (R) -2.37% | (I) 2.09% | 1.69% | (R) -3.93% |
> | | Scale: 1 Clients: 250 Read Write (TPS) | 855.67 | -1.36% | -0.14% | -0.12% | -0.49% |
> | | Scale: 1 Clients: 250 Read Write - Latency (ms) | 292.39 | -1.31% | -0.08% | -0.08% | -0.49% |
> | | Scale: 1 Clients: 1000 Read Only (TPS) | 1534130.08 | (R) -11.37% | 0.08% | 0.48% | (R) -11.85% |
> | | Scale: 1 Clients: 1000 Read Only - Latency (ms) | 0.65 | (R) -11.38% | 0.08% | 0.44% | (R) -11.87% |
> | | Scale: 1 Clients: 1000 Read Write (TPS) | 578.75 | -1.11% | 2.15% | -0.96% | 1.60% |
> | | Scale: 1 Clients: 1000 Read Write - Latency (ms) | 1736.98 | -1.26% | 2.47% | -0.90% | 1.52% |
> | | Scale: 100 Clients: 1 Read Only (TPS) | 57170.33 | 1.68% | 0.10% | 0.22% | 2.16% |
> | | Scale: 100 Clients: 1 Read Only - Latency (ms) | 0.018 | 1.94% | 0.00% | 0.96% | 1.94% |
> | | Scale: 100 Clients: 1 Read Write (TPS) | 836.58 | -0.37% | -0.41% | 0.07% | 0.07% |
> | | Scale: 100 Clients: 1 Read Write - Latency (ms) | 1.20 | -0.37% | -0.40% | 0.06% | 0.06% |
> | | Scale: 100 Clients: 250 Read Only (TPS) | 1773440.67 | -1.61% | 1.67% | 1.34% | (R) -2.94% |
> | | Scale: 100 Clients: 250 Read Only - Latency (ms) | 0.14 | -1.40% | 1.56% | 1.20% | (R) -2.87% |
> | | Scale: 100 Clients: 250 Read Write (TPS) | 5505.50 | -0.17% | -0.86% | -1.66% | 0.17% |
> | | Scale: 100 Clients: 250 Read Write - Latency (ms) | 45.42 | -0.17% | -0.85% | -1.67% | 0.17% |
> | | Scale: 100 Clients: 1000 Read Only (TPS) | 1393037.50 | (R) -10.31% | -0.19% | 0.53% | (R) -10.36% |
> | | Scale: 100 Clients: 1000 Read Only - Latency (ms) | 0.72 | (R) -10.30% | -0.17% | 0.53% | (R) -10.35% |
> | | Scale: 100 Clients: 1000 Read Write (TPS) | 5085.92 | 0.27% | 0.07% | -0.79% | -2.32% |
> | | Scale: 100 Clients: 1000 Read Write - Latency (ms) | 196.79 | 0.23% | 0.05% | -0.81% | -2.27% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+------------+----------------+------------------+
> | mmtests/hackbench | hackbench-process-pipes-1 (seconds) | 0.14 | -1.51% | -1.05% | -1.51% | 0.35% |
> | | hackbench-process-pipes-4 (seconds) | 0.44 | (I) 6.49% | (I) 5.42% | (I) 6.06% | (I) 5.72% |
> | | hackbench-process-pipes-7 (seconds) | 0.68 | (R) -18.36% | (I) 3.40% | -0.41% | (R) -24.54% |
> | | hackbench-process-pipes-12 (seconds) | 1.24 | (R) -19.89% | -0.45% | (R) -2.23% | (R) -24.55% |
> | | hackbench-process-pipes-21 (seconds) | 1.81 | (R) -8.41% | -1.22% | (R) -2.46% | (R) -13.58% |
> | | hackbench-process-pipes-30 (seconds) | 2.39 | (R) -9.06% | (R) -2.95% | -1.62% | (R) -13.21% |
> | | hackbench-process-pipes-48 (seconds) | 3.18 | (R) -11.68% | (R) -4.10% | -0.26% | (R) -12.63% |
> | | hackbench-process-pipes-79 (seconds) | 3.84 | (R) -9.74% | (R) -3.25% | (R) -2.45% | (R) -10.31% |
> | | hackbench-process-pipes-110 (seconds) | 4.68 | (R) -6.57% | (R) -2.12% | (R) -2.25% | (R) -7.15% |
> | | hackbench-process-pipes-141 (seconds) | 5.75 | (R) -5.86% | (R) -3.44% | (R) -2.89% | (R) -5.60% |
> | | hackbench-process-pipes-172 (seconds) | 6.80 | (R) -4.28% | (R) -2.81% | (R) -2.44% | (R) -4.79% |
> | | hackbench-process-pipes-203 (seconds) | 7.94 | (R) -4.01% | (R) -3.00% | (R) -2.17% | (R) -3.74% |
> | | hackbench-process-pipes-234 (seconds) | 9.02 | (R) -3.52% | (R) -2.81% | (R) -2.20% | (R) -3.63% |
> | | hackbench-process-pipes-256 (seconds) | 9.78 | (R) -3.24% | (R) -2.81% | (R) -2.74% | (R) -3.19% |
> | | hackbench-process-sockets-1 (seconds) | 0.29 | 0.50% | 0.26% | 0.03% | -0.43% |
> | | hackbench-process-sockets-4 (seconds) | 0.76 | (I) 17.44% | (I) 16.31% | (I) 19.09% | (I) 18.69% |
> | | hackbench-process-sockets-7 (seconds) | 1.16 | (I) 12.10% | (I) 9.78% | (I) 11.83% | (I) 11.37% |
> | | hackbench-process-sockets-12 (seconds) | 1.86 | (I) 10.19% | (I) 9.83% | (I) 11.21% | (I) 9.31% |
> | | hackbench-process-sockets-21 (seconds) | 3.12 | (I) 9.38% | (I) 9.20% | (I) 10.30% | (I) 8.99% |
> | | hackbench-process-sockets-30 (seconds) | 4.30 | (I) 6.43% | (I) 6.11% | (I) 7.22% | (I) 6.75% |
> | | hackbench-process-sockets-48 (seconds) | 6.58 | (I) 3.00% | (I) 2.19% | (I) 2.85% | (I) 2.98% |
> | | hackbench-process-sockets-79 (seconds) | 10.56 | (I) 2.87% | (I) 3.31% | 3.10% | (I) 3.10% |
> | | hackbench-process-sockets-110 (seconds) | 13.85 | -1.15% | (I) 2.33% | 0.22% | 0.44% |
> | | hackbench-process-sockets-141 (seconds) | 19.23 | -1.40% | (I) 14.53% | 2.64% | 1.54% |
> | | hackbench-process-sockets-172 (seconds) | 26.33 | (I) 3.52% | (I) 30.37% | (I) 4.32% | (I) 4.25% |
> | | hackbench-process-sockets-203 (seconds) | 30.27 | 1.10% | (I) 27.20% | 0.32% | 1.67% |
> | | hackbench-process-sockets-234 (seconds) | 35.12 | 1.60% | (I) 28.24% | 1.28% | (I) 3.11% |
> | | hackbench-process-sockets-256 (seconds) | 38.74 | 0.70% | (I) 28.74% | 0.53% | 1.48% |
> | | hackbench-thread-pipes-1 (seconds) | 0.17 | -1.32% | -0.76% | -0.67% | -0.76% |
> | | hackbench-thread-pipes-4 (seconds) | 0.45 | (I) 6.91% | (I) 7.64% | (I) 9.08% | (I) 6.15% |
> | | hackbench-thread-pipes-7 (seconds) | 0.74 | (R) -7.51% | (I) 5.26% | (I) 2.82% | (R) -9.98% |
> | | hackbench-thread-pipes-12 (seconds) | 1.32 | (R) -8.40% | (I) 2.32% | -0.53% | (R) -14.42% |
> | | hackbench-thread-pipes-21 (seconds) | 1.95 | (R) -2.95% | 0.91% | (R) -2.00% | (R) -7.93% |
> | | hackbench-thread-pipes-30 (seconds) | 2.50 | (R) -4.61% | 1.47% | -1.63% | (R) -11.99% |
> | | hackbench-thread-pipes-48 (seconds) | 3.32 | (R) -5.45% | (I) 2.15% | 0.81% | (R) -11.45% |
> | | hackbench-thread-pipes-79 (seconds) | 4.04 | (R) -5.53% | 1.85% | -0.53% | (R) -8.88% |
> | | hackbench-thread-pipes-110 (seconds) | 4.94 | (R) -2.33% | 1.51% | 0.59% | (R) -4.92% |
> | | hackbench-thread-pipes-141 (seconds) | 6.04 | (R) -2.47% | 1.15% | 0.24% | (R) -3.56% |
> | | hackbench-thread-pipes-172 (seconds) | 7.15 | -0.91% | 1.48% | 0.45% | -1.93% |
> | | hackbench-thread-pipes-203 (seconds) | 8.31 | -1.29% | 0.77% | 0.40% | -1.41% |
> | | hackbench-thread-pipes-234 (seconds) | 9.49 | -1.03% | 0.77% | 0.65% | -1.21% |
> | | hackbench-thread-pipes-256 (seconds) | 10.30 | -0.80% | 0.42% | 0.30% | -0.92% |
> | | hackbench-thread-sockets-1 (seconds) | 0.31 | 0.05% | -0.05% | -0.43% | -0.05% |
> | | hackbench-thread-sockets-4 (seconds) | 0.79 | (I) 18.91% | (I) 16.82% | (I) 19.79% | (I) 19.30% |
> | | hackbench-thread-sockets-7 (seconds) | 1.16 | (I) 12.57% | (I) 10.63% | (I) 12.95% | (I) 11.90% |
> | | hackbench-thread-sockets-12 (seconds) | 1.87 | (I) 12.65% | (I) 12.26% | (I) 13.90% | (I) 11.66% |
> | | hackbench-thread-sockets-21 (seconds) | 3.16 | (I) 11.62% | (I) 12.74% | (I) 13.89% | (I) 11.06% |
> | | hackbench-thread-sockets-30 (seconds) | 4.32 | (I) 7.35% | (I) 8.89% | (I) 9.51% | (I) 6.58% |
> | | hackbench-thread-sockets-48 (seconds) | 6.45 | (I) 2.69% | (I) 3.06% | (I) 3.74% | 1.92% |
> | | hackbench-thread-sockets-79 (seconds) | 10.15 | (I) 3.30% | 1.98% | (I) 2.76% | -0.20% |
> | | hackbench-thread-sockets-110 (seconds) | 13.45 | -0.25% | (I) 3.68% | 0.44% | -0.41% |
> | | hackbench-thread-sockets-141 (seconds) | 17.87 | (R) -2.18% | (I) 8.46% | 1.51% | -1.01% |
> | | hackbench-thread-sockets-172 (seconds) | 24.38 | 1.02% | (I) 24.33% | 1.38% | 1.33% |
> | | hackbench-thread-sockets-203 (seconds) | 28.38 | -0.99% | (I) 24.20% | 0.57% | 0.72% |
> | | hackbench-thread-sockets-234 (seconds) | 32.75 | -0.42% | (I) 24.35% | 0.72% | 1.00% |
> | | hackbench-thread-sockets-256 (seconds) | 36.49 | -1.30% | (I) 26.22% | 0.81% | 1.22% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+------------+----------------+------------------+
>
Powered by blists - more mailing lists