linux-kernel - Re: [PATCH 0/4] sched: Various reweight

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4c48fe59-8ff3-41fb-83cb-869409f6fbc6@amd.com>
Date: Tue, 3 Feb 2026 12:15:56 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Peter Zijlstra <peterz@...radead.org>, <mingo@...nel.org>
CC: <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
	<dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
	<mgorman@...e.de>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>,
	<wangtao554@...wei.com>, <quzicheng@...wei.com>, <wuyun.abel@...edance.com>,
	<dsmythies@...us.net>
Subject: Re: [PATCH 0/4] sched: Various reweight_entity() fixes

Hello Peter,

On 1/30/2026 3:04 PM, Peter Zijlstra wrote:
> Two issues related to reweight_entity() were raised; poking at all that got me
> these patches.
> 
> They're in queue.git/sched/core and I spend most of yesterday staring at traces
> trying to find anything wrong. So far, so good.
> 
> Please test.

I put this on top of tip:sched/urgent + tip:sched/core which contains Ingo's
cleanup of removing the union and at some point in the benchmark run I hit:

    BUG: kernel NULL pointer dereference, address: 0000000000000051
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD c153802067 P4D c1750e7067 PUD c16067e067 PMD 0
    Oops: Oops: 0000 [#1] SMP NOPTI
    CPU: 200 UID: 1000 PID: 92850 Comm: schbench Not tainted 6.19.0-rc6-peterz-eevdf-fix+ #4 PREEMPT(full)
    Hardware name: ... (Zen4c server)
    RIP: 0010:pick_task_fair+0x3c/0x130
    Code: ...
    RSP: 0000:ff5cc03f25ecfd58 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: ff3087d6eb032380 RCX: 00000000056ae402
    RDX: fffe78b16a6ed620 RSI: fffe790e92f4c046 RDI: 00027caa24e6c3ee
    RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000002
    R10: 0000086bfb248f00 R11: 0000000000000438 R12: ff3087d6eb032480
    R13: ff5cc03f25ecfea0 R14: ff3087d6eb032380 R15: ff3087d6eb032380
    FS:  00007f176438a640(0000) GS:ff3087d73d0e2000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000051 CR3: 000000c0d275c048 CR4: 0000000000f71ef0
    PKRU: 55555554
    Call Trace:
     <TASK>
     pick_next_task_fair+0x46/0x7b0
     ? task_tick_fair+0xf1/0x8b0
     ? perf_event_task_tick+0x5e/0xc0
     __pick_next_task+0x41/0x1d0
     __schedule+0x26e/0x17a0
     ? srso_alias_return_thunk+0x5/0xfbef5
     ? timerqueue_add+0x9f/0xc0
     ? __hrtimer_run_queues+0x139/0x240
     ? ktime_get+0x3f/0xf0
     ? srso_alias_return_thunk+0x5/0xfbef5
     ? srso_alias_return_thunk+0x5/0xfbef5
     ? srso_alias_return_thunk+0x5/0xfbef5
     ? clockevents_program_event+0xaa/0x100
     schedule+0x27/0xd0
     irqentry_exit+0x2a8/0x610
     ? srso_alias_return_thunk+0x5/0xfbef5
     ? __irq_exit_rcu+0x3f/0xf0
     asm_sysvec_apic_timer_interrupt+0x1a/0x20
    RIP: 0033:0x7f17f2498e58
    Code: ...
    RSP: 002b:00007f1764389d48 EFLAGS: 00000202
    RAX: 0000000000000010 RBX: 00000000000000c8 RCX: 00007f17f24e57f8
    RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000014cd2820
    RBP: 0000000014cd2820 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000293 R12: 00007f1764389e00
    R13: 00007f176439fed0 R14: 0000000000002a40 R15: 00007f17b414c778
     </TASK>
    Modules linked in: ...
    CR2: 0000000000000051
    ---[ end trace 0000000000000000 ]---


RIP points to "se->sched_delayed" dereference in pick_task_fair():

   $ scripts/faddr2line vmlinux pick_task_fair+0x3c/0x130
    pick_task_fair+0x3c/0x130:
    pick_next_entity at kernel/sched/fair.c:5648
    (inlined by) pick_task_fair at kernel/sched/fair.c:9061
    
    $ sed -n '5645,5651p' kernel/sched/fair.c
            struct sched_entity *se;
    
            se = pick_eevdf(cfs_rq);
            if (se->sched_delayed) {
                    dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
                    /*
                     * Must not reference @se again, see __block_task().


so something went sideways with the avg_vruntime calculation I presume.
I'm rerunning with the PARANOID_AVG feat now.

Just re-running the particular schbench variant hasn't crashed the kernel
in the half hour it has been running so I've re-triggered the same set of
benchmarks to see if flipping PARANOID_AVG makes any difference.

If you have a debug patch somewhere that you would like data on this run
from, please do let me know, else I plan on capturing the rq state at
the time of crash (cfs_rq walk, dumping all the vruntimes of all the
queued entities).

-- 
Thanks and Regards,
Prateek