linux-kernel - RE: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <000801db604b$e0f6b580$a2e42080$@telus.net>
Date: Mon, 6 Jan 2025 07:01:34 -0800
From: "Doug Smythies" <dsmythies@...us.net>
To: "'Peter Zijlstra'" <peterz@...radead.org>
Cc: <linux-kernel@...r.kernel.org>,
	<vincent.guittot@...aro.org>,
	"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF

On 2025.01.06 03:58 Peter Zijlstra wrote:
>On Sun, Dec 29, 2024 at 02:51:43PM -0800, Doug Smythies wrote:
>> Hi Peter,
>> 
>> I have been having trouble with turbostat reporting processor package power levels that can not possibly be true.
>> After eliminating the turbostat program itself as the source of the issue I bisected the kernel.
>> An edited summary (actual log attached):
>> 
>> 	 	82e9d0456e06 sched/fair: Avoid re-setting virtual deadline on 'migrations'
>> b10	bad	fc1892becd56 sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE
>> b13	bad	54a58a787791 sched/fair: Implement DELAY_ZERO
>> 	skip	152e11f6df29 sched/fair: Implement delayed dequeue
>> 	skip	e1459a50ba31 sched: Teach dequeue_task() about special task states
>> 	skip	a1c446611e31 sched,freezer: Mark TASK_FROZEN special
>> 	skip	781773e3b680 sched/fair: Implement ENQUEUE_DELAYED
>> 	skip	f12e148892ed sched/fair: Prepare pick_next_task() for delayed dequeue
>> 	skip	2e0199df252a sched/fair: Prepare exit/cleanup paths for delayed_dequeue
>> b12	good	e28b5f8bda01 sched/fair: Assert {set_next,put_prev}_entity() are properly balanced
>> 		dfa0a574cbc4 sched/uclamg: Handle delayed dequeue
>> b11	good	abc158c82ae5 sched: Prepare generic code for delayed dequeue
>> 		e8901061ca0c sched: Split DEQUEUE_SLEEP from deactivate_task()
>> 
>> Where "bN" is just my assigned kernel name for each bisection step.
>> 
>> In the linux-kernel email archives I found a thread that isolated these same commits.
>> It was from late Novermebr / early December:
>> 
>> https://lore.kernel.org/all/20240727105030.226163742@infradead.org/T/#m9aeb4d897e029cf7546513bb09499c320457c174
>> 
>> An example of the turbostat manifestation of the issue:
>> 
>> doug@s19:~$ sudo ~/kernel/linux/tools/power/x86/turbostat/turbostat --quiet --Summary --show
>> Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,TSC_MHz --interval 1
>> [sudo] password for doug:
>> Busy%   Bzy_MHz TSC_MHz IRQ     PkgTmp  PkgWatt
>> 99.76   4800    4104    12304   73      80.08
>> 99.76   4800    4104    12047   73      80.23
>> 99.76   4800    879     12157   73      11.40
>> 99.76   4800    26667   84214   72      557.23
>> 99.76   4800    4104    12036   72      79.39
>> 
>> Where TSC_MHz was reported as 879, there was a big gap in time.
>> Like 4.7 seconds instead of 1.
>> Where TSC_MHz was reported as 26667, there was not a big gap in time.
>> 
>> It happens for about 5% of the samples + or - a lot.
>> It only happens when the workload is almost exactly 100%.
>> More load, it doesn't occur.
>> Less load, it doesn't occur. Although, I did get this once:
>> 
>> Busy%   Bzy_MHz TSC_MHz IRQ     PkgTmp  PkgWatt
>> 91.46   4800    4104    11348   73      103.98
>> 91.46   4800    4104    11353   73      103.89
>> 91.50   4800    3903    11339   73      98.16
>> 91.43   4800    4271    12001   73      108.52
>> 91.45   4800    4148    11481   73      105.13
>> 91.46   4800    4104    11341   73      103.96
>> 91.46   4800    4104    11348   73      103.99
>> 
>> So, it might just be much less probable and less severe.
>> 
>> It happens over many different types of workload that I have tried.
>
> In private email you've communicated it happens due to
> sched_setaffinity() sometimes taking multiple seconds.
>
> I'm trying to reproduce by starting a bash 'while ;: do :; done' spinner
> for each CPU, but so far am not able to reproduce.

I have also been trying to reproduce the issue without using turbostat.
No success.

The other thing to note is that my test computer is otherwise very
very idle with no GUI and few services.

>
> What is the easiest 100% load you're seeing this with?

Lately, and specifically to be able to tell others, I have been using:

yes > /dev/null &

On my Intel i5-10600K, with 6 cores and 2 threads per core, 12 CPUs,
I run 12 of those work loads.

... Doug