linux-kernel - Re: [PATCH 0/4] sched: Various reweight

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <38ef3462-4c4e-4f40-8d63-84dd71cbd043@amd.com>
Date: Tue, 3 Feb 2026 17:49:16 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <mingo@...nel.org>, <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
	<dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
	<mgorman@...e.de>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>,
	<wangtao554@...wei.com>, <quzicheng@...wei.com>, <wuyun.abel@...edance.com>,
	<dsmythies@...us.net>
Subject: Re: [PATCH 0/4] sched: Various reweight_entity() fixes

Hello Peter,

On 2/3/2026 4:41 PM, Peter Zijlstra wrote:
> On Tue, Feb 03, 2026 at 12:15:56PM +0530, K Prateek Nayak wrote:
>> Hello Peter,
>>
>> On 1/30/2026 3:04 PM, Peter Zijlstra wrote:
>>> Two issues related to reweight_entity() were raised; poking at all that got me
>>> these patches.
>>>
>>> They're in queue.git/sched/core and I spend most of yesterday staring at traces
>>> trying to find anything wrong. So far, so good.
>>>
>>> Please test.
>>
>> I put this on top of tip:sched/urgent + tip:sched/core which contains Ingo's
>> cleanup of removing the union and at some point in the benchmark run I hit:
>>
>>     BUG: kernel NULL pointer dereference, address: 0000000000000051
> 
> :-(
> 
>>
>> so something went sideways with the avg_vruntime calculation I presume.
>> I'm rerunning with the PARANOID_AVG feat now.
>>
>> Just re-running the particular schbench variant hasn't crashed the kernel
>> in the half hour it has been running so I've re-triggered the same set of
>> benchmarks to see if flipping PARANOID_AVG makes any difference.
> 
> If you run with PARANOID_AVG, the condition ends up visible as:
> 
>   grep shift /debug/sched/debug
> 
> If any of the fields are !0, you tripped an overflow.

Yup I see a few !0 values. Some inching closer to the BUG_ON()

 grep "shift.*: [^0]$" /sys/kernel/debug/sched/debug
  .sum_shift                     : 4
  .sum_shift                     : 3
  .sum_shift                     : 5
  .sum_shift                     : 1
  .sum_shift                     : 2
  .sum_shift                     : 3

> 
> Once its !0, you can't get it back to 0 (except perhaps if its cgroup
> things, in which case you can destroy and re-create the cgroups I
> suppose) other than reboot.
> 
> Anyway, if you can reproduce without PARANOID_AVG (or indeed have
> tripped overflow) could you share the specific schbench invocation you
> used?

This trips when I'm running a (very) old version of schbench at commit
e4aa540 ("Make sure rps isn't zero in auto_rps mode.")

I'm running the following on a 512 CPU server:

#!/bin/bash

DIR=$1
MESSENGERS=1
MAX_ITERS=2
SCHBENCH=./schbench

for i in 1 2 4 8 16 32 64 128 256 512 768 1024;
do
    THISDIR=$DIR/$i-workers
    if [ ! -d $THISDIR ]
    then
        mkdir -p $THISDIR
    fi
    for j in `seq 0 $MAX_ITERS`
    do
        echo "===== Worker $i : Iter $j ======";
        $SCHBENCH -m $MESSENGERS -t $i  |& tee $THISDIR/iter-$j.log;
    sleep 2
    done
done


Fails when it is running with 768 workers. Standalone runs didn't
fail - have to run a cumulative runner that runs sched-messaging,
stream, tbench, netperf, first before running schbench :-(

> 
> I'm not sure I have valuable tracing patches, I just stick random
> trace_printk()s in.

I'll plop those in and update once the I get a log for sum_shift++.

-- 
Thanks and Regards,
Prateek