[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4c48fe59-8ff3-41fb-83cb-869409f6fbc6@amd.com>
Date: Tue, 3 Feb 2026 12:15:56 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Peter Zijlstra <peterz@...radead.org>, <mingo@...nel.org>
CC: <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
<dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
<mgorman@...e.de>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>,
<wangtao554@...wei.com>, <quzicheng@...wei.com>, <wuyun.abel@...edance.com>,
<dsmythies@...us.net>
Subject: Re: [PATCH 0/4] sched: Various reweight_entity() fixes
Hello Peter,
On 1/30/2026 3:04 PM, Peter Zijlstra wrote:
> Two issues related to reweight_entity() were raised; poking at all that got me
> these patches.
>
> They're in queue.git/sched/core and I spend most of yesterday staring at traces
> trying to find anything wrong. So far, so good.
>
> Please test.
I put this on top of tip:sched/urgent + tip:sched/core which contains Ingo's
cleanup of removing the union and at some point in the benchmark run I hit:
BUG: kernel NULL pointer dereference, address: 0000000000000051
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD c153802067 P4D c1750e7067 PUD c16067e067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 200 UID: 1000 PID: 92850 Comm: schbench Not tainted 6.19.0-rc6-peterz-eevdf-fix+ #4 PREEMPT(full)
Hardware name: ... (Zen4c server)
RIP: 0010:pick_task_fair+0x3c/0x130
Code: ...
RSP: 0000:ff5cc03f25ecfd58 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ff3087d6eb032380 RCX: 00000000056ae402
RDX: fffe78b16a6ed620 RSI: fffe790e92f4c046 RDI: 00027caa24e6c3ee
RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000002
R10: 0000086bfb248f00 R11: 0000000000000438 R12: ff3087d6eb032480
R13: ff5cc03f25ecfea0 R14: ff3087d6eb032380 R15: ff3087d6eb032380
FS: 00007f176438a640(0000) GS:ff3087d73d0e2000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000051 CR3: 000000c0d275c048 CR4: 0000000000f71ef0
PKRU: 55555554
Call Trace:
<TASK>
pick_next_task_fair+0x46/0x7b0
? task_tick_fair+0xf1/0x8b0
? perf_event_task_tick+0x5e/0xc0
__pick_next_task+0x41/0x1d0
__schedule+0x26e/0x17a0
? srso_alias_return_thunk+0x5/0xfbef5
? timerqueue_add+0x9f/0xc0
? __hrtimer_run_queues+0x139/0x240
? ktime_get+0x3f/0xf0
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? clockevents_program_event+0xaa/0x100
schedule+0x27/0xd0
irqentry_exit+0x2a8/0x610
? srso_alias_return_thunk+0x5/0xfbef5
? __irq_exit_rcu+0x3f/0xf0
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0033:0x7f17f2498e58
Code: ...
RSP: 002b:00007f1764389d48 EFLAGS: 00000202
RAX: 0000000000000010 RBX: 00000000000000c8 RCX: 00007f17f24e57f8
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000014cd2820
RBP: 0000000014cd2820 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f1764389e00
R13: 00007f176439fed0 R14: 0000000000002a40 R15: 00007f17b414c778
</TASK>
Modules linked in: ...
CR2: 0000000000000051
---[ end trace 0000000000000000 ]---
RIP points to "se->sched_delayed" dereference in pick_task_fair():
$ scripts/faddr2line vmlinux pick_task_fair+0x3c/0x130
pick_task_fair+0x3c/0x130:
pick_next_entity at kernel/sched/fair.c:5648
(inlined by) pick_task_fair at kernel/sched/fair.c:9061
$ sed -n '5645,5651p' kernel/sched/fair.c
struct sched_entity *se;
se = pick_eevdf(cfs_rq);
if (se->sched_delayed) {
dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
/*
* Must not reference @se again, see __block_task().
so something went sideways with the avg_vruntime calculation I presume.
I'm rerunning with the PARANOID_AVG feat now.
Just re-running the particular schbench variant hasn't crashed the kernel
in the half hour it has been running so I've re-triggered the same set of
benchmarks to see if flipping PARANOID_AVG makes any difference.
If you have a debug patch somewhere that you would like data on this run
from, please do let me know, else I plan on capturing the rq state at
the time of crash (cfs_rq walk, dumping all the vruntimes of all the
queued entities).
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists