linux-kernel - RE: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <000d01db6115$69c1aef0$3d450cd0$@telus.net>
Date: Tue, 7 Jan 2025 07:04:12 -0800
From: "Doug Smythies" <dsmythies@...us.net>
To: "'Peter Zijlstra'" <peterz@...radead.org>
Cc: <linux-kernel@...r.kernel.org>,
	<vincent.guittot@...aro.org>,
	"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF

On 2025.01.07 03:26 Peter Zijlstra wrote:
> On Mon, Jan 06, 2025 at 02:28:40PM -0800, Doug Smythies wrote:
>
>> Which will show when a CPU migration took over 10 milliseconds.
>> If you want to go further, for example to only display ones that took
>> over a second and to include the target CPU, then patch turbostat:
>> 
>> doug@s19:~/kernel/linux/tools/power/x86/turbostat$ git diff
>> diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
>> index 58a487c225a7..f8a73cc8fbfc 100644
>> --- a/tools/power/x86/turbostat/turbostat.c
>> +++ b/tools/power/x86/turbostat/turbostat.c
>> @@ -2704,7 +2704,7 @@ int format_counters(struct thread_data *t, struct core_data *c, struct pkg_data
>>                 struct timeval tv;
>> 
>>                 timersub(&t->tv_end, &t->tv_begin, &tv);
>> -               outp += sprintf(outp, "%5ld\t", tv.tv_sec * 1000000 + tv.tv_usec);
>> +               outp += sprintf(outp, "%7ld\t", tv.tv_sec * 1000000 + tv.tv_usec);
>>         }
>> 
>>         /* Time_Of_Day_Seconds: on each row, print sec.usec last timestamp taken */
>> @@ -4570,12 +4570,14 @@ int get_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
>>         int i;
>>         int status;
>> 
>> +       gettimeofday(&t->tv_begin, (struct timezone *)NULL); /* doug test */
>> +
>>         if (cpu_migrate(cpu)) {
>>                 fprintf(outf, "%s: Could not migrate to CPU %d\n", __func__, cpu);
>>                 return -1;
>>         }
>> 
>> -       gettimeofday(&t->tv_begin, (struct timezone *)NULL);
>> +//     gettimeofday(&t->tv_begin, (struct timezone *)NULL);
>> 
>>         if (first_counter_read)
>>                 get_apic_id(t);
>> 
>> 
>
> So I've taken the second node offline, running with 10 cores (20
> threads) now.
>
> usec    Time_Of_Day_Seconds     CPU     Busy%   IRQ
> 106783 1736248404.951438       -       100.00  20119
>     46 1736248404.844701       0       100.00  1005
>     41 1736248404.844742       20      100.00  1007
>     42 1736248404.844784       1       100.00  1005
>     40 1736248404.844824       21      100.00  1006
>     41 1736248404.844865       2       100.00  1005
>     40 1736248404.844905       22      100.00  1006
>     41 1736248404.844946       3       100.00  1006
>     40 1736248404.844986       23      100.00  1005
>     41 1736248404.845027       4       100.00  1005
>     40 1736248404.845067       24      100.00  1006
>     41 1736248404.845108       5       100.00  1011
>     40 1736248404.845149       25      100.00  1005
>     41 1736248404.845190       6       100.00  1005
>     40 1736248404.845230       26      100.00  1005
>     42 1736248404.845272       7       100.00  1007
>     41 1736248404.845313       27      100.00  1005
>     41 1736248404.845355       8       100.00  1005
>     42 1736248404.845397       28      100.00  1006
>     46 1736248404.845443       9       100.00  1009
> 105995 1736248404.951438       29      100.00  1005
>
> Is by far the worst I've had in the past few minutes playing with this.
>
> If I get a blimp (>10000) then it is always on the last CPU, are you
> seeing the same thing?

More or less, yes. The very long migrations are dominated by the
CPU 5 to CPU 11 migration.

Here is data from yesterday for other CPUs:
usec    Time_Of_Day_Seconds     CPU     Busy%   IRQ
605706 1736224605.542844       0       99.76   1922
  10001 1736224605.561844       1       99.76   1922
  10999 1736224605.572843       7       99.76   1923
  11001 1736224605.583844       2       99.76   1925
  11000 1736224605.606844       4       99.76   1924
  10999 1736224605.617843       10      99.76   1923
 105001 1736224605.722844       5       99.76   1922
 465657 1736224608.190843       8       99.76   1002
 494000 1736224608.684843       3       99.76   1003
 395674 1736224610.081843       7       99.76   1964
  19679 1736224617.108843       7       99.76   1003
  37709 1736224636.633845       0       99.76   1003
  65641 1736224689.796843       9       99.76   1003
 406631 1736224693.206843       4       99.76   1002
 105622 1736225026.238843       10      99.76   1003
 409622 1736225053.673843       10      99.76   1003
  16706 1736225302.149847       0       99.76   1820
  10000 1736225302.185846       4       99.76   1825
  19663 1736225317.249844       7       99.76   1012
>
>> In this short example all captures were for the CPU 5 to 11 migration.
>> 2 at 6 seconds, 1 at 1.33 seconds and 1 at 2 seconds.
>
> This seems to suggest you are, always on CPU 11.
>
> Weird!

Yes, weird. I think, but am not certain, the CPU sequence in turbostat
per interval loop is:

Wake on highest numbered CPU (11 in my case)
Do a bunch of work that can be done without MSR reads.
For each CPU in topological order (0,6,1,7,2,8,3,9,4,10,5,11 in my case)
  Do the CPU specific work
Finish the intervals work and printing and such on CPU 11.
Sleep for the interval time (we have been using 1 second)

Without any proof, I was thinking the CPU 11 dominance
for the long migration issue was due to the other bits of
work done on that CPU.

> Anyway, let me see if I can capture a trace of this..