linux-kernel - RE: [PATCH 0/4] sched: Various reweight

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <004a01dc952b$471c94a0$d555bde0$@telus.net>
Date: Tue, 3 Feb 2026 08:36:41 -0800
From: "Doug Smythies" <dsmythies@...us.net>
To: "'K Prateek Nayak'" <kprateek.nayak@....com>,
	"'Peter Zijlstra'" <peterz@...radead.org>
Cc: <mingo@...nel.org>,
	<juri.lelli@...hat.com>,
	<vincent.guittot@...aro.org>,
	<dietmar.eggemann@....com>,
	<rostedt@...dmis.org>,
	<bsegall@...gle.com>,
	<mgorman@...e.de>,
	<vschneid@...hat.com>,
	<linux-kernel@...r.kernel.org>,
	<wangtao554@...wei.com>,
	<quzicheng@...wei.com>,
	<wuyun.abel@...edance.com>,
	"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [PATCH 0/4] sched: Various reweight_entity() fixes

Hi All,

On 2026.02.03 04:19 K Prateek Nayak wrote:
> On 2/3/2026 4:41 PM, Peter Zijlstra wrote:
>> On Tue, Feb 03, 2026 at 12:15:56PM +0530, K Prateek Nayak wrote:
>>> On 1/30/2026 3:04 PM, Peter Zijlstra wrote:
>>>> Two issues related to reweight_entity() were raised; poking at all that got me
>>>> these patches.
>>>>
>>>> They're in queue.git/sched/core and I spend most of yesterday staring at traces
>>>> trying to find anything wrong. So far, so good.
>>>>
>>>> Please test.
>>>
>>> I put this on top of tip:sched/urgent + tip:sched/core which contains Ingo's
>>> cleanup of removing the union and at some point in the benchmark run I hit:
>>>
>>>     BUG: kernel NULL pointer dereference, address: 0000000000000051

... snip ...

> This trips when I'm running a (very) old version of schbench at commit
> e4aa540 ("Make sure rps isn't zero in auto_rps mode.")
>
> I'm running the following on a 512 CPU server:
>
> #!/bin/bash
>
> DIR=$1
> MESSENGERS=1
> MAX_ITERS=2
> SCHBENCH=./schbench
>
> for i in 1 2 4 8 16 32 64 128 256 512 768 1024;
> do
>    THISDIR=$DIR/$i-workers
>     if [ ! -d $THISDIR ]
>     then
>        mkdir -p $THISDIR
>     fi
>    for j in `seq 0 $MAX_ITERS`
>    do
>        echo "===== Worker $i : Iter $j ======";
>        $SCHBENCH -m $MESSENGERS -t $i  |& tee $THISDIR/iter-$j.log;
>    sleep 2
>    done
> done
>
> Fails when it is running with 768 workers. Standalone runs didn't
> fail - have to run a cumulative runner that runs sched-messaging,
> stream, tbench, netperf, first before running schbench :-(

Further to my email from the other day, where all was good [1],
I have continued to test, in particular the severe overload conditions
from [2].

Under heavy overload my test computer just hangs. My multiple
ssh sessions eventually terminate. I have left it for any hours, but
have to reset it in the end.
The first time there were no log entries at all, at least that I could
find.
The second time:
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000051
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: Oops: 0000 [#1] SMP NOPTI
kernel: CPU: 11 UID: 1000 PID: 3597 Comm: yes Not tainted 6.19.0-rc1-pz #1 PREEMPT(full)
...

The entire relevant part is attached.

Conditions:
Greater than 12,500 X (yes > /dev/null) tasks
But less than 15,000 X ( yes > /dev/null) tasks

I have tested up to 20,000 X (yes > /dev/null) tasks
with previous kernels, including mainline 6.19-rc1.

I would not disagree if you say my operating conditions
are ridiculous.

System:
Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
CPU frequency scaling driver: intel_pstate; Governor powersave.
HWP: Enabled

[1] https://lore.kernel.org/lkml/000d01dc939e$0fc99fe0$2f5cdfa0$@telus.net/
[2] https://lore.kernel.org/lkml/002401dbb6bd$4527ec00$cf77c400$@telus.net/

... Doug

Download attachment "kern.log" of type "application/octet-stream" (66512 bytes)