[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250106115732.GE20870@noisy.programming.kicks-ass.net>
Date: Mon, 6 Jan 2025 12:57:32 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Doug Smythies <dsmythies@...us.net>
Cc: linux-kernel@...r.kernel.org, vincent.guittot@...aro.org
Subject: Re: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF
On Sun, Dec 29, 2024 at 02:51:43PM -0800, Doug Smythies wrote:
> Hi Peter,
>
> I have been having trouble with turbostat reporting processor package power levels that can not possibly be true.
> After eliminating the turbostat program itself as the source of the issue I bisected the kernel.
> An edited summary (actual log attached):
>
> 82e9d0456e06 sched/fair: Avoid re-setting virtual deadline on 'migrations'
> b10 bad fc1892becd56 sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE
> b13 bad 54a58a787791 sched/fair: Implement DELAY_ZERO
> skip 152e11f6df29 sched/fair: Implement delayed dequeue
> skip e1459a50ba31 sched: Teach dequeue_task() about special task states
> skip a1c446611e31 sched,freezer: Mark TASK_FROZEN special
> skip 781773e3b680 sched/fair: Implement ENQUEUE_DELAYED
> skip f12e148892ed sched/fair: Prepare pick_next_task() for delayed dequeue
> skip 2e0199df252a sched/fair: Prepare exit/cleanup paths for delayed_dequeue
> b12 good e28b5f8bda01 sched/fair: Assert {set_next,put_prev}_entity() are properly balanced
> dfa0a574cbc4 sched/uclamg: Handle delayed dequeue
> b11 good abc158c82ae5 sched: Prepare generic code for delayed dequeue
> e8901061ca0c sched: Split DEQUEUE_SLEEP from deactivate_task()
>
> Where "bN" is just my assigned kernel name for each bisection step.
>
> In the linux-kernel email archives I found a thread that isolated these same commits.
> It was from late Novermebr / early December:
>
> https://lore.kernel.org/all/20240727105030.226163742@infradead.org/T/#m9aeb4d897e029cf7546513bb09499c320457c174
>
> An example of the turbostat manifestation of the issue:
>
> doug@s19:~$ sudo ~/kernel/linux/tools/power/x86/turbostat/turbostat --quiet --Summary --show
> Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,TSC_MHz --interval 1
> [sudo] password for doug:
> Busy% Bzy_MHz TSC_MHz IRQ PkgTmp PkgWatt
> 99.76 4800 4104 12304 73 80.08
> 99.76 4800 4104 12047 73 80.23
> 99.76 4800 879 12157 73 11.40
> 99.76 4800 26667 84214 72 557.23
> 99.76 4800 4104 12036 72 79.39
>
> Where TSC_MHz was reported as 879, there was a big gap in time.
> Like 4.7 seconds instead of 1.
> Where TSC_MHz was reported as 26667, there was not a big gap in time.
>
> It happens for about 5% of the samples + or - a lot.
> It only happens when the workload is almost exactly 100%.
> More load, it doesn't occur.
> Less load, it doesn't occur. Although, I did get this once:
>
> Busy% Bzy_MHz TSC_MHz IRQ PkgTmp PkgWatt
> 91.46 4800 4104 11348 73 103.98
> 91.46 4800 4104 11353 73 103.89
> 91.50 4800 3903 11339 73 98.16
> 91.43 4800 4271 12001 73 108.52
> 91.45 4800 4148 11481 73 105.13
> 91.46 4800 4104 11341 73 103.96
> 91.46 4800 4104 11348 73 103.99
>
> So, it might just be much less probable and less severe.
>
> It happens over many different types of workload that I have tried.
In private email you've communicated it happens due to
sched_setaffinity() sometimes taking multiple seconds.
I'm trying to reproduce by starting a bash 'while ;: do :; done' spinner
for each CPU, but so far am not able to reproduce.
What is the easiest 100% load you're seeing this with?
Powered by blists - more mailing lists