[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dffe53a4-0ef2-4346-ad73-c4b71a734b3a@linux.ibm.com>
Date: Sun, 16 Nov 2025 02:26:13 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: linux-kernel@...r.kernel.org,
"Peter Zijlstra (Intel)" <peterz@...radead.org>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>, linux-tip-commits@...r.kernel.org,
Chen Yu <yu.c.chen@...el.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
K Prateek Nayak <kprateek.nayak@....com>,
Srikar Dronamraju <srikar@...ux.ibm.com>,
Mohini Narkhede <mohini.narkhede@...el.com>, x86@...nel.org
Subject: Re: [tip: sched/core] sched/fair: Skip sched_balance_running cmpxchg
when balance is not due
Hi Peter.
On 11/14/25 5:49 PM, tip-bot2 for Tim Chen wrote:
> The following commit has been merged into the sched/core branch of tip:
>
> Commit-ID: 2265c5d4deeff3bfe4580d9ffe718fd80a414cac
> Gitweb: https://git.kernel.org/tip/2265c5d4deeff3bfe4580d9ffe718fd80a414cac
> Author: Tim Chen <tim.c.chen@...ux.intel.com>
> AuthorDate: Mon, 10 Nov 2025 10:47:35 -08:00
> Committer: Peter Zijlstra <peterz@...radead.org>
> CommitterDate: Fri, 14 Nov 2025 13:03:05 +01:00
>
> sched/fair: Skip sched_balance_running cmpxchg when balance is not due
>
>
> + if (!need_unlock && (sd->flags & SD_SERIALIZE) && idle != CPU_NEWLY_IDLE) {
> + if (!atomic_try_cmpxchg_acquire(&sched_balance_running, 0, 1))
This should be atomic_cmpxchg_acquire?
I booted the system with latest sched/core and it crashes at the boot.
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc0000000001db57c
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
Modules linked in:
CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.18.0-rc3+ #242 PREEMPT(lazy)
NIP [c0000000001db57c] sched_balance_rq+0x560/0x92c
LR [c0000000001db198] sched_balance_rq+0x17c/0x92c
Call Trace:
[c00000111ffdfd10] [c0000000001db198] sched_balance_rq+0x17c/0x92c (unreliable)
[c00000111ffdfe50] [c0000000001dc598] sched_balance_domains+0x2c4/0x3d0
[c00000111ffdff00] [c000000000168958] handle_softirqs+0x138/0x414
[c00000111ffdffe0] [c000000000017d80] do_softirq_own_stack+0x3c/0x50
[c000000008a57a60] [c000000000168048] __irq_exit_rcu+0x18c/0x1b4
[c000000008a57a90] [c0000000001691a8] irq_exit+0x20/0x38
[c000000008a57ab0] [c000000000028c18] timer_interrupt+0x174/0x394
[c000000008a57b10] [c000000000009f8c] decrementer_common_virt+0x28c/0x290
Bisect pointed to:
git bisect bad 2265c5d4deeff3bfe4580d9ffe718fd80a414cac
# first bad commit: [2265c5d4deeff3bfe4580d9ffe718fd80a414cac] sched/fair: Skip sched_balance_running cmpxchg when balance is not due
I wondered what is really different since the tim's v4 boots fine.
There is try instead in the tip, i think that is messing it since likely
we are dereferencing 0?
With this diff it boots fine.
---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index aaa47ece6a8e..01814b10b833 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11841,7 +11841,7 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
}
if (!need_unlock && (sd->flags & SD_SERIALIZE)) {
- if (!atomic_try_cmpxchg_acquire(&sched_balance_running, 0, 1))
+ if (!atomic_cmpxchg_acquire(&sched_balance_running, 0, 1))
goto out_balanced;
need_unlock = true;
Powered by blists - more mailing lists