lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <000d01dc939e$0fc99fe0$2f5cdfa0$@telus.net>
Date: Sun, 1 Feb 2026 09:13:18 -0800
From: "Doug Smythies" <dsmythies@...us.net>
To: "'Peter Zijlstra'" <peterz@...radead.org>
Cc: <juri.lelli@...hat.com>,
	<vincent.guittot@...aro.org>,
	<dietmar.eggemann@....com>,
	<rostedt@...dmis.org>,
	<bsegall@...gle.com>,
	<mgorman@...e.de>,
	<vschneid@...hat.com>,
	<linux-kernel@...r.kernel.org>,
	<wangtao554@...wei.com>,
	<quzicheng@...wei.com>,
	<kprateek.nayak@....com>,
	<wuyun.abel@...edance.com>,
	"Doug Smythies" <dsmythies@...us.net>,
	<mingo@...nel.org>
Subject: RE: [PATCH 0/4] sched: Various reweight_entity() fixes

Hi Peter,

Thank you for including me on this set of emails. 

I assume I was copied on this patch set because I reported an issue a year ago,
and I see patch 4 of 4 reverts the fix from that time.

I also note that there is a pending update to patch 4 of 4. I will re-test then.

On 2026.01.30 01:35 Peter Zijlstra wrote:
> Two issues related to reweight_entity() were raised; poking at all that got me
> these patches.
>
> They're in queue.git/sched/core

Thanks. I tried to apply them to kernel 6.19-rc5, but patch 3 of 4 would not apply.
It took me awhile to figure out what you meant but got there in the end.

> and I spend most of yesterday staring at traces
> trying to find anything wrong. So far, so good.
>
> Please test.

Happy to.

There were 2 issues raised a year ago: One was extremely long CPU migration
times under specific conditions, thousands of times longer than reasonable;
The second was similar, but much much less in magnitude. The second issue
was hidden by the first but became apparent once the first was fixed.
For more background, readers are referred to the long email thread [1].

Testing of this patch set:

For those that don't want to read: Summary: all good.

The main diagnostic tool used here is turbostat, where the issues are shown via
anomalies in the time between samples. The test setup is an otherwise
very idle system with a 100.0% load applied. Command used:

sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,TSC_MHz,Time_Of_Day_Seconds,usec --interval 1 --out
/dev/shm/turbo.log

The data is post processed and a histogram of the times between samples is created. 1 millisecond per histogram bin.

Step 1: Confirm where we left off a year ago:

The exact same kernel from a year ago, that we ended up happy with, was used.

doug@s19:~/tmp/peterz/6.19/turbo$ cat 613.his
Kernel: 6.13.0-stock
gov: powersave
HWP: enabled

1.000000, 23195
1.001000, 10897
1.002000, 49
1.003000, 23
1.004000, 21
1.005000, 9

Total: 34194 : Total >= 10 mSec: 0 ( 0.00 percent)

So, over 9 hours and never a nominal sample time exceeded by over 5 milliseconds.
Very good.

Step 2: Take a baseline sample before this patch set:
Mainline kernel 6.19-rc1 was used:

doug@s19:~/tmp/peterz/6.19/turbo$ cat rc1.his
Kernel: 6.19.0-rc1-stock
gov: powersave
HWP: enabled

1.000000, 19509
1.001000, 10430
1.002000, 32
1.003000, 19
1.004000, 24
1.005000, 13
1.006000, 9
1.007000, 4
1.008000, 3
1.009000, 4
1.010000, 6
1.011000, 2
1.012000, 1
1.013000, 4
1.014000, 10
1.015000, 10
1.016000, 7
1.017000, 10
1.018000, 20
1.019000, 12
1.020000, 5
1.021000, 3
1.022000, 1
1.023000, 2
1.024000, 2  <<< Clamped. Actually 26 and 25 milliseconds

Total: 30142 : Total >= 10 mSec: 95 ( 0.32 percent)

What!!!
Over 8 hours.
It seems something has regressed over the last year.
Our threshold of 10 milliseconds was rather arbitrary.

Step 3: This patch set and from Peter's git tree:

doug@s19:~/tmp/peterz/6.19/turbo$ cat 02.his
kernel: 6.19.0-rc1-pz
gov: powersave
HWP: enabled

1.000000, 19139
1.001000, 9532
1.002000, 19
1.003000, 17
1.004000, 8
1.005000, 3
1.006000, 2
1.009000, 1

Total: 28721 : Total >= 10 mSec: 0 ( 0.00 percent)

Just about 8 hours.
Never a time >= our arbitrary threshold of 10 milliseconds.
So, good.
I will redo this test with the revised patch 4 of 4 when it is available.

... Doug

[1] https://lore.kernel.org/lkml/005f01db5a44$3bb698e0$b323caa0$@telus.net/



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ