[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250107112606.GN20870@noisy.programming.kicks-ass.net>
Date: Tue, 7 Jan 2025 12:26:06 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Doug Smythies <dsmythies@...us.net>
Cc: linux-kernel@...r.kernel.org, vincent.guittot@...aro.org
Subject: Re: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF
On Mon, Jan 06, 2025 at 02:28:40PM -0800, Doug Smythies wrote:
> Which will show when a CPU migration took over 10 milliseconds.
> If you want to go further, for example to only display ones that took
> over a second and to include the target CPU, then patch turbostat:
>
> doug@s19:~/kernel/linux/tools/power/x86/turbostat$ git diff
> diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
> index 58a487c225a7..f8a73cc8fbfc 100644
> --- a/tools/power/x86/turbostat/turbostat.c
> +++ b/tools/power/x86/turbostat/turbostat.c
> @@ -2704,7 +2704,7 @@ int format_counters(struct thread_data *t, struct core_data *c, struct pkg_data
> struct timeval tv;
>
> timersub(&t->tv_end, &t->tv_begin, &tv);
> - outp += sprintf(outp, "%5ld\t", tv.tv_sec * 1000000 + tv.tv_usec);
> + outp += sprintf(outp, "%7ld\t", tv.tv_sec * 1000000 + tv.tv_usec);
> }
>
> /* Time_Of_Day_Seconds: on each row, print sec.usec last timestamp taken */
> @@ -4570,12 +4570,14 @@ int get_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
> int i;
> int status;
>
> + gettimeofday(&t->tv_begin, (struct timezone *)NULL); /* doug test */
> +
> if (cpu_migrate(cpu)) {
> fprintf(outf, "%s: Could not migrate to CPU %d\n", __func__, cpu);
> return -1;
> }
>
> - gettimeofday(&t->tv_begin, (struct timezone *)NULL);
> +// gettimeofday(&t->tv_begin, (struct timezone *)NULL);
>
> if (first_counter_read)
> get_apic_id(t);
>
>
So I've taken the second node offline, running with 10 cores (20
threads) now.
usec Time_Of_Day_Seconds CPU Busy% IRQ
106783 1736248404.951438 - 100.00 20119
46 1736248404.844701 0 100.00 1005
41 1736248404.844742 20 100.00 1007
42 1736248404.844784 1 100.00 1005
40 1736248404.844824 21 100.00 1006
41 1736248404.844865 2 100.00 1005
40 1736248404.844905 22 100.00 1006
41 1736248404.844946 3 100.00 1006
40 1736248404.844986 23 100.00 1005
41 1736248404.845027 4 100.00 1005
40 1736248404.845067 24 100.00 1006
41 1736248404.845108 5 100.00 1011
40 1736248404.845149 25 100.00 1005
41 1736248404.845190 6 100.00 1005
40 1736248404.845230 26 100.00 1005
42 1736248404.845272 7 100.00 1007
41 1736248404.845313 27 100.00 1005
41 1736248404.845355 8 100.00 1005
42 1736248404.845397 28 100.00 1006
46 1736248404.845443 9 100.00 1009
105995 1736248404.951438 29 100.00 1005
Is by far the worst I've had in the past few minutes playing with this.
If I get a blimp (>10000) then it is always on the last CPU, are you
seeing the same thing?
> In this short example all captures were for the CPU 5 to 11 migration.
> 2 at 6 seconds, 1 at 1.33 seconds and 1 at 2 seconds.
This seems to suggest you are, always on CPU 11.
Weird!
Anyway, let me see if I can capture a trace of this..
Powered by blists - more mailing lists