[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ec3ea66f-3a0d-4b5a-ab36-ce778f159b5b@linux.ibm.com>
Date: Thu, 8 Jan 2026 15:31:52 +0530
From: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To: Mel Gorman <mgorman@...hsingularity.net>,
Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Valentin Schneider <vschneid@...hat.com>, Chris Mason <clm@...a.com>,
linux-kernel@...r.kernel.org,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [REGRESSION] [PATCH 0/2 v5] Reintroduce NEXT_BUDDY for EEVDF
On 12/11/25 17:55, Mel Gorman wrote:
> Changes since v4
> o Splitout decisions into separate functions (peterz)
> o Flow clarity (peterz)
>
> Changes since v3
> o Place new code near first consumer (peterz)
> o Separate between PREEMPT_SHORT and NEXT_BUDDY (peterz)
> o Naming and code flow clarity (peterz)
> o Restore slice protection (peterz)
>
> Changes since v2
> o Review feedback applied from Prateek
>
> I've been chasing down a number of schedule issues recently like many
> others and found they were broadly grouped as
>
> 1. Failure to boost CPU frequency with powersave/ondemand governors
> 2. Processors entering idle states that are too deep
> 3. Differences in wakeup latencies for wakeup-intensive workloads
>
> Adding topology into account means that there is a lot of machine-specific
> behaviour which may explain why some discussions recently have reproduction
> problems. Nevertheless, the removal of LAST_BUDDY and NEXT_BUDDY being
> disabled has an impact on wakeup latencies.
>
> This series enables NEXT_BUDDY and may select a wakee if it's eligible to
> run even though other unrelated tasks may have an earlier deadline.
>
> Mel Gorman (2):
> sched/fair: Enable scheduler feature NEXT_BUDDY
> sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals
>
> kernel/sched/fair.c | 152 ++++++++++++++++++++++++++++++++++------
> kernel/sched/features.h | 2 +-
> 2 files changed, 131 insertions(+), 23 deletions(-)
>
Hi Mel, Peter,
During internal testing, I noticed approximately 7% regression in a real-world workload
called DayTrader.
Git bisect pointed to this patch:
"sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals"
Before this patch was merged, I reported a regression in v4 with schbench and stress-ng.
>From that discussion:
https://lore.kernel.org/all/ddfde793-ad6e-4517-96b8-662dcb78acc8@linux.ibm.com/#t
```
So with frequent wakeups, queued tasks (even with earlier deadlines) may be
unfairly delayed. I understand that this would fade away quickly as the
woken up task that got to run due to buddy preference would accumulate negative
lag and would not be eligible to run again, but the starvation could be higher if
wakeups are very high.
To test this, I ran schbench (many message and worker threads) together with
stress-ng (CPU-bound), and observed stress-ng's bogo-ops throughput dropped by
around 64%.
This shows a significant regression for CPU-bound tasks under heavy wakeup loads.
```
I understand that stress-ng bogo-ops is not a reliable metric. However, the problem
appears to be real, as DayTrader also shows regression with this patch.
To check if WF_SYNC related change is the issue, I tried to decrease threshold by
`echo 50000 > /sys/kernel/debug/sched/migration_cost_ns` so that waker could preempt
quickly in WF_SYNC case. This helped but I understand that it changes a lot of code paths
that use migration_cost_ns. So, when I decreased only threshold in this patch, the
performance didn't improve.
So, I think the problem is in making the tasks that are having earlier deadline to wait in
presence of frequent wakeups is hurting CPU intensive workloads.
Any thoughts/ideas?
Meanwhile, I will also spend time to workaround this patch and see if the performance
could be improved.
Thanks,
Vineeth
Powered by blists - more mailing lists