lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <001201db6120$c8eec2e0$5acc48a0$@telus.net>
Date: Tue, 7 Jan 2025 08:25:37 -0800
From: "Doug Smythies" <dsmythies@...us.net>
To: "'Peter Zijlstra'" <peterz@...radead.org>
Cc: <linux-kernel@...r.kernel.org>,
	<vincent.guittot@...aro.org>,
	"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF

On 2025.01.07 07:04 Doug Smythies wrote:
> On 2025.01.07 03:26 Peter Zijlstra wrote:
>> On Mon, Jan 06, 2025 at 02:28:40PM -0800, Doug Smythies wrote:

>> If I get a blimp (>10000) then it is always on the last CPU, are you
>> seeing the same thing?
>
> More or less, yes. The very long migrations are dominated by the
> CPU 5 to CPU 11 migration.
>>
>>> In this short example all captures were for the CPU 5 to 11 migration.
>>> 2 at 6 seconds, 1 at 1.33 seconds and 1 at 2 seconds.
>>
>> This seems to suggest you are, always on CPU 11.
>>
>> Weird!
>
> Yes, weird. I think, but am not certain, the CPU sequence in turbostat
> per interval loop is:
>
> Wake on highest numbered CPU (11 in my case)
> Do a bunch of work that can be done without MSR reads.
> For each CPU in topological order (0,6,1,7,2,8,3,9,4,10,5,11 in my case)
>  Do the CPU specific work
> Finish the intervals work and printing and such on CPU 11.
> Sleep for the interval time (we have been using 1 second)
>
> Without any proof, I was thinking the CPU 11 dominance
> for the long migration issue was due to the other bits of
> work done on that CPU.

To test this theory I hacked turbostat to migrate to CPU 3
After the CPU specific work loop.
So now the per interval workflow is:

Wake on CPU 3
Do a bunch of work that can be done without MSR reads.
For each CPU in topological order (0,6,1,7,2,8,3,9,4,10,5,11 in my case)
 Do the CPU specific work
Migrate to CPU 3
Finish the intervals work and printing and such on CPU 3.
Sleep for the interval time

And now I get:

usec    Time_Of_Day_Seconds     CPU     Busy%   IRQ
  12646 1736266361.533240       3       99.76   1005
6004653 1736266384.555240       3       99.76   1006
6004653 1736266393.563240       3       99.76   1004
6005648 1736266400.570240       3       99.76   7019
6005653 1736266432.602240       3       99.76   1005
6003656 1736266479.652242       3       99.76   1004
  15636 1736266501.690240       3       99.76   1005
4948651 1736266528.661240       3       99.76   1004
 521672 1736266534.192240       2       99.76   1002
1117651 1736266585.360239       3       99.76   1004
6003652 1736266592.365240       3       99.76   2123
3526648 1736266612.909240       3       99.76   1004
6003650 1736266632.927240       3       99.76   1005
 396623 1736266636.327239       10      99.76   1002
6003654 1736266660.349240       3       99.76   1005
6003653 1736266682.369239       3       99.76   1006
6003653 1736266703.388240       3       99.76   1004
 514673 1736266718.918240       2       99.76   1003
  14652 1736266725.940240       3       99.76   1004
6003653 1736266745.958240       3       99.76   1004
6003653 1736266767.978240       3       99.76   1006
6003652 1736266794.002240       3       99.76   1006
6003653 1736266815.021240       3       99.76   1004
2496651 1736266841.542239       3       99.76   1007
6003647 1736266848.547240       3       99.76   3504  <<< 8 minutes 7 seconds elapsed



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ