linux-kernel - Re: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250106115732.GE20870@noisy.programming.kicks-ass.net>
Date: Mon, 6 Jan 2025 12:57:32 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Doug Smythies <dsmythies@...us.net>
Cc: linux-kernel@...r.kernel.org, vincent.guittot@...aro.org
Subject: Re: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF

On Sun, Dec 29, 2024 at 02:51:43PM -0800, Doug Smythies wrote:
> Hi Peter,
> 
> I have been having trouble with turbostat reporting processor package power levels that can not possibly be true.
> After eliminating the turbostat program itself as the source of the issue I bisected the kernel.
> An edited summary (actual log attached):
> 
> 	 	82e9d0456e06 sched/fair: Avoid re-setting virtual deadline on 'migrations'
> b10	bad	fc1892becd56 sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE
> b13	bad	54a58a787791 sched/fair: Implement DELAY_ZERO
> 	skip	152e11f6df29 sched/fair: Implement delayed dequeue
> 	skip	e1459a50ba31 sched: Teach dequeue_task() about special task states
> 	skip	a1c446611e31 sched,freezer: Mark TASK_FROZEN special
> 	skip	781773e3b680 sched/fair: Implement ENQUEUE_DELAYED
> 	skip	f12e148892ed sched/fair: Prepare pick_next_task() for delayed dequeue
> 	skip	2e0199df252a sched/fair: Prepare exit/cleanup paths for delayed_dequeue
> b12	good	e28b5f8bda01 sched/fair: Assert {set_next,put_prev}_entity() are properly balanced
> 		dfa0a574cbc4 sched/uclamg: Handle delayed dequeue
> b11	good	abc158c82ae5 sched: Prepare generic code for delayed dequeue
> 		e8901061ca0c sched: Split DEQUEUE_SLEEP from deactivate_task()
> 
> Where "bN" is just my assigned kernel name for each bisection step.
> 
> In the linux-kernel email archives I found a thread that isolated these same commits.
> It was from late Novermebr / early December:
> 
> https://lore.kernel.org/all/20240727105030.226163742@infradead.org/T/#m9aeb4d897e029cf7546513bb09499c320457c174
> 
> An example of the turbostat manifestation of the issue:
> 
> doug@s19:~$ sudo ~/kernel/linux/tools/power/x86/turbostat/turbostat --quiet --Summary --show
> Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,TSC_MHz --interval 1
> [sudo] password for doug:
> Busy%   Bzy_MHz TSC_MHz IRQ     PkgTmp  PkgWatt
> 99.76   4800    4104    12304   73      80.08
> 99.76   4800    4104    12047   73      80.23
> 99.76   4800    879     12157   73      11.40
> 99.76   4800    26667   84214   72      557.23
> 99.76   4800    4104    12036   72      79.39
> 
> Where TSC_MHz was reported as 879, there was a big gap in time.
> Like 4.7 seconds instead of 1.
> Where TSC_MHz was reported as 26667, there was not a big gap in time.
> 
> It happens for about 5% of the samples + or - a lot.
> It only happens when the workload is almost exactly 100%.
> More load, it doesn't occur.
> Less load, it doesn't occur. Although, I did get this once:
> 
> Busy%   Bzy_MHz TSC_MHz IRQ     PkgTmp  PkgWatt
> 91.46   4800    4104    11348   73      103.98
> 91.46   4800    4104    11353   73      103.89
> 91.50   4800    3903    11339   73      98.16
> 91.43   4800    4271    12001   73      108.52
> 91.45   4800    4148    11481   73      105.13
> 91.46   4800    4104    11341   73      103.96
> 91.46   4800    4104    11348   73      103.99
> 
> So, it might just be much less probable and less severe.
> 
> It happens over many different types of workload that I have tried.

In private email you've communicated it happens due to
sched_setaffinity() sometimes taking multiple seconds.

I'm trying to reproduce by starting a bash 'while ;: do :; done' spinner
for each CPU, but so far am not able to reproduce.

What is the easiest 100% load you're seeing this with?