lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCMcgMO1mK4sNwHtqbKWTQRB_92yPE2vd+11k7aHAukew@mail.gmail.com>
Date: Thu, 22 Jan 2026 18:34:28 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Mel Gorman <mgorman@...hsingularity.net>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...hat.com>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, 
	Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Valentin Schneider <vschneid@...hat.com>, Chris Mason <clm@...a.com>, linux-kernel@...r.kernel.org, 
	Christian.Loehle@....com
Subject: Re: [PATCH] sched/fair: Disable scheduler feature NEXT_BUDDY

Hi Ryan,

Thanks for adding me in the loop


On Thu, 22 Jan 2026 at 14:38, Ryan Roberts <ryan.roberts@....com> wrote:
>
> Hi Mel,
>
>
> On 20/01/2026 11:33, Mel Gorman wrote:
> > NEXT_BUDDY was disabled with the introduction of EEVDF and enabled again
> > after NEXT_BUDDY was rewritten for EEVDF by commit e837456fdca8 ("sched/fair:
> > Reimplement NEXT_BUDDY to align with EEVDF goals"). It was not expected
> > that this would be a universal win without a crystal ball instruction
> > but the reported regressions are a concern [1][2] even if gains were
> > also reported. Specifically;
> >
> > o mysql with client/server running on different servers regresses
> > o specjbb reports lower peak metrics
> > o daytrader regresses
> >
> > The mysql is realistic and a concern. It needs to be confirmed if
> > specjbb is simply shifting the point where peak performance is measured
> > but still a concern. daytrader is considered to be representative of a
> > real workload.
> >
> > Access to test machines is currently problematic for verifying any fix to
> > this problem. Disable NEXT_BUDDY for now by default until the root causes
> > are addressed.

The new NEXT_BUDDY implementation is doing more than setting a buddy;
it also breaks the run to parity mechanism by always setting next
buddy during wakeup_preempt_fair() even if there is no relation
between the 2 tasks and PICK_BUDDY bypasses protections

In addition to disable NEXT_BUDDY, i suggest to also revert the force
preemption section below which also breaks run_to_parity by doing an
assumption whereas WF_SYNC is normally there for such purpose

-- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8822,16 +8822,6 @@ static void wakeup_preempt_fair(struct rq *rq,
struct task_struct *p, int wake_f
        if ((wake_flags & WF_FORK) || pse->sched_delayed)
                return;

-       /*
-        * If @p potentially is completing work required by current then
-        * consider preemption.
-        *
-        * Reschedule if waker is no longer eligible. */
-       if (in_task() && !entity_eligible(cfs_rq, se)) {
-               preempt_action = PREEMPT_WAKEUP_RESCHED;
-               goto preempt;
-       }
-
        /* Prefer picking wakee soon if appropriate. */
        if (sched_feat(NEXT_BUDDY) &&
            set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {

This largely increases the number of resched and preemption because a
task becomes quickly "ineligible": We update our internal vruntime
periodically and before the task exhausted its slice.
Example:
2 tasks A and B wake up simultaneously with lag = 0. Both are
eligible. Task A runs 1st and wakes task C up.  Scheduler updates task
A's vruntime which becomes greater than average runtime as all others
have a lag == 0 and didn't run yet. Now task A is ineligible because
it received more runtime than the other task but it has not yet
exhausted its slice nor a min quantum. We force preemption, disable
protection but Task B will run 1st not task C.

Sidenote, DELAY_ZERO increases this effect by clearing positive lag

> >
> > Link: https://lore.kernel.org/lkml/4b96909a-f1ac-49eb-b814-97b8adda6229@arm.com [1]
> > Link: https://lore.kernel.org/lkml/ec3ea66f-3a0d-4b5a-ab36-ce778f159b5b@linux.ibm.com [2]
> > Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
> > ---
> >  kernel/sched/features.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> > index 980d92bab8ab..136a6584be79 100644
> > --- a/kernel/sched/features.h
> > +++ b/kernel/sched/features.h
> > @@ -29,7 +29,7 @@ SCHED_FEAT(PREEMPT_SHORT, true)
> >   * wakeup-preemption), since its likely going to consume data we
> >   * touched, increases cache locality.
> >   */
> > -SCHED_FEAT(NEXT_BUDDY, true)
> > +SCHED_FEAT(NEXT_BUDDY, false)
> >
> >  /*
> >   * Allow completely ignoring cfs_rq->next; which can be set from various
>
>
> We have rerun the same set of benchmarks for v6.19-rc6 + this patch. I've added
> the results as an extra column. Numbers all relative to v6.18. Other columns as
> per [1].
>
> [1] https://lore.kernel.org/all/63d22eb9-b309-4d11-aa56-3f1e7e12edb1@arm.com/
>
> 6-18-0 (base)           (baseline)
> 6-19-0-rc1              (New NEXT_BUDDY implementation enabled)
> revert #1 & #2          (NEXT_BUDDY disabled)
> revert #2               (Old NEXT_BUDDY implementation enabled)
> 6-19-0-rc6+patch        (New NEXT_BUDDY implementation disabled)
>
> It's definitely better than v6.19-rc1. But it's not as good as "revert #1 & #2".
>
> So I guess this implies that disabling the new version of NEXT_BUDDY is not
> exactly the same as reverting your original patches #1 and #2 - i.e. old version
> of NEXT_BUDDY disabled isn't exactly the same as new version of NEXT_BUDDY
> disabled?
>
> Thanks,
> Ryan
>
>
> Multi-node SUT (workload running across 2 machines):
>
> +---------------------------------+----------------------------------------------------+---------------+-------------+-----------------------------+------------------+
> | Benchmark                       | Result Class                                       | 6-18-0 (base) |  6-19-0-rc1 |  revert #2 | revert #1 & #2 | 6-19-0-rc6+patch |
> +=================================+====================================================+===============+=============+============+================+==================+
> | repro-collection/mysql-workload | db transaction rate (transactions/min)             |     646267.33 |  (R) -1.33% |  (I) 5.87% |      (I) 7.63% |        (I) 4.01% |
> |                                 | new order rate (orders/min)                        |     213256.50 |  (R) -1.32% |  (I) 5.87% |      (I) 7.64% |        (I) 3.94% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+------------+----------------+------------------+
>
> Single-node SUT (workload running on single machine):
>
> +---------------------------------+----------------------------------------------------+---------------+-------------+-----------------------------+------------------+
> | Benchmark                       | Result Class                                       | 6-18-0 (base) |  6-19-0-rc1 |  revert #2 | revert #1 & #2 | 6-19-0-rc6+patch |
> +=================================+====================================================+===============+=============+============+================+==================+
> | specjbb/composite               | critical-jOPS (jOPS)                               |      94700.00 |  (R) -5.10% |     -0.90% |         -0.37% |        (I) 3.07% |
> |                                 | max-jOPS (jOPS)                                    |     113984.50 |  (R) -3.90% |     -0.65% |          0.65% |        (I) 1.94% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+------------+----------------+------------------+
> | repro-collection/mysql-workload | db transaction rate (transactions/min)             |     245438.25 |  (R) -3.88% |     -0.13% |          0.24% |           -1.34% |
> |                                 | new order rate (orders/min)                        |      80985.75 |  (R) -3.78% |     -0.07% |          0.29% |           -1.29% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+------------+----------------+------------------+
> | pts/pgbench                     | Scale: 1 Clients: 1 Read Only (TPS)                |      63124.00 |   (I) 2.90% |      0.74% |          0.85% |            2.58% |
> |                                 | Scale: 1 Clients: 1 Read Only - Latency (ms)       |         0.016 |   (I) 5.49% |      1.05% |          1.05% |            4.35% |
> |                                 | Scale: 1 Clients: 1 Read Write (TPS)               |        974.92 |       0.11% |     -0.08% |         -0.03% |            0.11% |
> |                                 | Scale: 1 Clients: 1 Read Write - Latency (ms)      |          1.03 |       0.12% |     -0.06% |         -0.06% |            0.14% |
> |                                 | Scale: 1 Clients: 250 Read Only (TPS)              |    1915931.58 |  (R) -2.25% |  (I) 2.12% |          1.62% |       (R) -3.92% |
> |                                 | Scale: 1 Clients: 250 Read Only - Latency (ms)     |          0.13 |  (R) -2.37% |  (I) 2.09% |          1.69% |       (R) -3.93% |
> |                                 | Scale: 1 Clients: 250 Read Write (TPS)             |        855.67 |      -1.36% |     -0.14% |         -0.12% |           -0.49% |
> |                                 | Scale: 1 Clients: 250 Read Write - Latency (ms)    |        292.39 |      -1.31% |     -0.08% |         -0.08% |           -0.49% |
> |                                 | Scale: 1 Clients: 1000 Read Only (TPS)             |    1534130.08 | (R) -11.37% |      0.08% |          0.48% |      (R) -11.85% |
> |                                 | Scale: 1 Clients: 1000 Read Only - Latency (ms)    |          0.65 | (R) -11.38% |      0.08% |          0.44% |      (R) -11.87% |
> |                                 | Scale: 1 Clients: 1000 Read Write (TPS)            |        578.75 |      -1.11% |      2.15% |         -0.96% |            1.60% |
> |                                 | Scale: 1 Clients: 1000 Read Write - Latency (ms)   |       1736.98 |      -1.26% |      2.47% |         -0.90% |            1.52% |
> |                                 | Scale: 100 Clients: 1 Read Only (TPS)              |      57170.33 |       1.68% |      0.10% |          0.22% |            2.16% |
> |                                 | Scale: 100 Clients: 1 Read Only - Latency (ms)     |         0.018 |       1.94% |      0.00% |          0.96% |            1.94% |
> |                                 | Scale: 100 Clients: 1 Read Write (TPS)             |        836.58 |      -0.37% |     -0.41% |          0.07% |            0.07% |
> |                                 | Scale: 100 Clients: 1 Read Write - Latency (ms)    |          1.20 |      -0.37% |     -0.40% |          0.06% |            0.06% |
> |                                 | Scale: 100 Clients: 250 Read Only (TPS)            |    1773440.67 |      -1.61% |      1.67% |          1.34% |       (R) -2.94% |
> |                                 | Scale: 100 Clients: 250 Read Only - Latency (ms)   |          0.14 |      -1.40% |      1.56% |          1.20% |       (R) -2.87% |
> |                                 | Scale: 100 Clients: 250 Read Write (TPS)           |       5505.50 |      -0.17% |     -0.86% |         -1.66% |            0.17% |
> |                                 | Scale: 100 Clients: 250 Read Write - Latency (ms)  |         45.42 |      -0.17% |     -0.85% |         -1.67% |            0.17% |
> |                                 | Scale: 100 Clients: 1000 Read Only (TPS)           |    1393037.50 | (R) -10.31% |     -0.19% |          0.53% |      (R) -10.36% |
> |                                 | Scale: 100 Clients: 1000 Read Only - Latency (ms)  |          0.72 | (R) -10.30% |     -0.17% |          0.53% |      (R) -10.35% |
> |                                 | Scale: 100 Clients: 1000 Read Write (TPS)          |       5085.92 |       0.27% |      0.07% |         -0.79% |           -2.32% |
> |                                 | Scale: 100 Clients: 1000 Read Write - Latency (ms) |        196.79 |       0.23% |      0.05% |         -0.81% |           -2.27% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+------------+----------------+------------------+
> | mmtests/hackbench               | hackbench-process-pipes-1 (seconds)                |          0.14 |      -1.51% |     -1.05% |         -1.51% |            0.35% |
> |                                 | hackbench-process-pipes-4 (seconds)                |          0.44 |   (I) 6.49% |  (I) 5.42% |      (I) 6.06% |        (I) 5.72% |
> |                                 | hackbench-process-pipes-7 (seconds)                |          0.68 | (R) -18.36% |  (I) 3.40% |         -0.41% |      (R) -24.54% |
> |                                 | hackbench-process-pipes-12 (seconds)               |          1.24 | (R) -19.89% |     -0.45% |     (R) -2.23% |      (R) -24.55% |
> |                                 | hackbench-process-pipes-21 (seconds)               |          1.81 |  (R) -8.41% |     -1.22% |     (R) -2.46% |      (R) -13.58% |
> |                                 | hackbench-process-pipes-30 (seconds)               |          2.39 |  (R) -9.06% | (R) -2.95% |         -1.62% |      (R) -13.21% |
> |                                 | hackbench-process-pipes-48 (seconds)               |          3.18 | (R) -11.68% | (R) -4.10% |         -0.26% |      (R) -12.63% |
> |                                 | hackbench-process-pipes-79 (seconds)               |          3.84 |  (R) -9.74% | (R) -3.25% |     (R) -2.45% |      (R) -10.31% |
> |                                 | hackbench-process-pipes-110 (seconds)              |          4.68 |  (R) -6.57% | (R) -2.12% |     (R) -2.25% |       (R) -7.15% |
> |                                 | hackbench-process-pipes-141 (seconds)              |          5.75 |  (R) -5.86% | (R) -3.44% |     (R) -2.89% |       (R) -5.60% |
> |                                 | hackbench-process-pipes-172 (seconds)              |          6.80 |  (R) -4.28% | (R) -2.81% |     (R) -2.44% |       (R) -4.79% |
> |                                 | hackbench-process-pipes-203 (seconds)              |          7.94 |  (R) -4.01% | (R) -3.00% |     (R) -2.17% |       (R) -3.74% |
> |                                 | hackbench-process-pipes-234 (seconds)              |          9.02 |  (R) -3.52% | (R) -2.81% |     (R) -2.20% |       (R) -3.63% |
> |                                 | hackbench-process-pipes-256 (seconds)              |          9.78 |  (R) -3.24% | (R) -2.81% |     (R) -2.74% |       (R) -3.19% |
> |                                 | hackbench-process-sockets-1 (seconds)              |          0.29 |       0.50% |      0.26% |          0.03% |           -0.43% |
> |                                 | hackbench-process-sockets-4 (seconds)              |          0.76 |  (I) 17.44% | (I) 16.31% |     (I) 19.09% |       (I) 18.69% |
> |                                 | hackbench-process-sockets-7 (seconds)              |          1.16 |  (I) 12.10% |  (I) 9.78% |     (I) 11.83% |       (I) 11.37% |
> |                                 | hackbench-process-sockets-12 (seconds)             |          1.86 |  (I) 10.19% |  (I) 9.83% |     (I) 11.21% |        (I) 9.31% |
> |                                 | hackbench-process-sockets-21 (seconds)             |          3.12 |   (I) 9.38% |  (I) 9.20% |     (I) 10.30% |        (I) 8.99% |
> |                                 | hackbench-process-sockets-30 (seconds)             |          4.30 |   (I) 6.43% |  (I) 6.11% |      (I) 7.22% |        (I) 6.75% |
> |                                 | hackbench-process-sockets-48 (seconds)             |          6.58 |   (I) 3.00% |  (I) 2.19% |      (I) 2.85% |        (I) 2.98% |
> |                                 | hackbench-process-sockets-79 (seconds)             |         10.56 |   (I) 2.87% |  (I) 3.31% |          3.10% |        (I) 3.10% |
> |                                 | hackbench-process-sockets-110 (seconds)            |         13.85 |      -1.15% |  (I) 2.33% |          0.22% |            0.44% |
> |                                 | hackbench-process-sockets-141 (seconds)            |         19.23 |      -1.40% | (I) 14.53% |          2.64% |            1.54% |
> |                                 | hackbench-process-sockets-172 (seconds)            |         26.33 |   (I) 3.52% | (I) 30.37% |      (I) 4.32% |        (I) 4.25% |
> |                                 | hackbench-process-sockets-203 (seconds)            |         30.27 |       1.10% | (I) 27.20% |          0.32% |            1.67% |
> |                                 | hackbench-process-sockets-234 (seconds)            |         35.12 |       1.60% | (I) 28.24% |          1.28% |        (I) 3.11% |
> |                                 | hackbench-process-sockets-256 (seconds)            |         38.74 |       0.70% | (I) 28.74% |          0.53% |            1.48% |
> |                                 | hackbench-thread-pipes-1 (seconds)                 |          0.17 |      -1.32% |     -0.76% |         -0.67% |           -0.76% |
> |                                 | hackbench-thread-pipes-4 (seconds)                 |          0.45 |   (I) 6.91% |  (I) 7.64% |      (I) 9.08% |        (I) 6.15% |
> |                                 | hackbench-thread-pipes-7 (seconds)                 |          0.74 |  (R) -7.51% |  (I) 5.26% |      (I) 2.82% |       (R) -9.98% |
> |                                 | hackbench-thread-pipes-12 (seconds)                |          1.32 |  (R) -8.40% |  (I) 2.32% |         -0.53% |      (R) -14.42% |
> |                                 | hackbench-thread-pipes-21 (seconds)                |          1.95 |  (R) -2.95% |      0.91% |     (R) -2.00% |       (R) -7.93% |
> |                                 | hackbench-thread-pipes-30 (seconds)                |          2.50 |  (R) -4.61% |      1.47% |         -1.63% |      (R) -11.99% |
> |                                 | hackbench-thread-pipes-48 (seconds)                |          3.32 |  (R) -5.45% |  (I) 2.15% |          0.81% |      (R) -11.45% |
> |                                 | hackbench-thread-pipes-79 (seconds)                |          4.04 |  (R) -5.53% |      1.85% |         -0.53% |       (R) -8.88% |
> |                                 | hackbench-thread-pipes-110 (seconds)               |          4.94 |  (R) -2.33% |      1.51% |          0.59% |       (R) -4.92% |
> |                                 | hackbench-thread-pipes-141 (seconds)               |          6.04 |  (R) -2.47% |      1.15% |          0.24% |       (R) -3.56% |
> |                                 | hackbench-thread-pipes-172 (seconds)               |          7.15 |      -0.91% |      1.48% |          0.45% |           -1.93% |
> |                                 | hackbench-thread-pipes-203 (seconds)               |          8.31 |      -1.29% |      0.77% |          0.40% |           -1.41% |
> |                                 | hackbench-thread-pipes-234 (seconds)               |          9.49 |      -1.03% |      0.77% |          0.65% |           -1.21% |
> |                                 | hackbench-thread-pipes-256 (seconds)               |         10.30 |      -0.80% |      0.42% |          0.30% |           -0.92% |
> |                                 | hackbench-thread-sockets-1 (seconds)               |          0.31 |       0.05% |     -0.05% |         -0.43% |           -0.05% |
> |                                 | hackbench-thread-sockets-4 (seconds)               |          0.79 |  (I) 18.91% | (I) 16.82% |     (I) 19.79% |       (I) 19.30% |
> |                                 | hackbench-thread-sockets-7 (seconds)               |          1.16 |  (I) 12.57% | (I) 10.63% |     (I) 12.95% |       (I) 11.90% |
> |                                 | hackbench-thread-sockets-12 (seconds)              |          1.87 |  (I) 12.65% | (I) 12.26% |     (I) 13.90% |       (I) 11.66% |
> |                                 | hackbench-thread-sockets-21 (seconds)              |          3.16 |  (I) 11.62% | (I) 12.74% |     (I) 13.89% |       (I) 11.06% |
> |                                 | hackbench-thread-sockets-30 (seconds)              |          4.32 |   (I) 7.35% |  (I) 8.89% |      (I) 9.51% |        (I) 6.58% |
> |                                 | hackbench-thread-sockets-48 (seconds)              |          6.45 |   (I) 2.69% |  (I) 3.06% |      (I) 3.74% |            1.92% |
> |                                 | hackbench-thread-sockets-79 (seconds)              |         10.15 |   (I) 3.30% |      1.98% |      (I) 2.76% |           -0.20% |
> |                                 | hackbench-thread-sockets-110 (seconds)             |         13.45 |      -0.25% |  (I) 3.68% |          0.44% |           -0.41% |
> |                                 | hackbench-thread-sockets-141 (seconds)             |         17.87 |  (R) -2.18% |  (I) 8.46% |          1.51% |           -1.01% |
> |                                 | hackbench-thread-sockets-172 (seconds)             |         24.38 |       1.02% | (I) 24.33% |          1.38% |            1.33% |
> |                                 | hackbench-thread-sockets-203 (seconds)             |         28.38 |      -0.99% | (I) 24.20% |          0.57% |            0.72% |
> |                                 | hackbench-thread-sockets-234 (seconds)             |         32.75 |      -0.42% | (I) 24.35% |          0.72% |            1.00% |
> |                                 | hackbench-thread-sockets-256 (seconds)             |         36.49 |      -1.30% | (I) 26.22% |          0.81% |            1.22% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+------------+----------------+------------------+
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ