[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240328215556.afaynyoldoizhcpr@airbuntu>
Date: Thu, 28 Mar 2024 21:55:56 +0000
From: Qais Yousef <qyousef@...alina.io>
To: Ingo Molnar <mingo@...nel.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Christian Loehle <christian.loehle@....com>,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched: Consolidate cpufreq updates
On 03/26/24 09:20, Ingo Molnar wrote:
>
> * Qais Yousef <qyousef@...alina.io> wrote:
>
> > Results of `perf stat --repeat 10 perf bench sched pipe` on AMD 3900X to
> > verify any potential overhead because of the addition at context switch
> >
> > Before:
> > -------
> >
> > Performance counter stats for 'perf bench sched pipe' (10 runs):
> >
> > 16,839.74 msec task-clock:u # 1.158 CPUs utilized ( +- 0.52% )
> > 0 context-switches:u # 0.000 /sec
> > 0 cpu-migrations:u # 0.000 /sec
> > 1,390 page-faults:u # 83.903 /sec ( +- 0.06% )
> > 333,773,107 cycles:u # 0.020 GHz ( +- 0.70% ) (83.72%)
> > 67,050,466 stalled-cycles-frontend:u # 19.94% frontend cycles idle ( +- 2.99% ) (83.23%)
> > 37,763,775 stalled-cycles-backend:u # 11.23% backend cycles idle ( +- 2.18% ) (83.09%)
> > 84,456,137 instructions:u # 0.25 insn per cycle
> > # 0.83 stalled cycles per insn ( +- 0.02% ) (83.01%)
> > 34,097,544 branches:u # 2.058 M/sec ( +- 0.02% ) (83.52%)
> > 8,038,902 branch-misses:u # 23.59% of all branches ( +- 0.03% ) (83.44%)
> >
> > 14.5464 +- 0.0758 seconds time elapsed ( +- 0.52% )
> >
> > After:
> > -------
> >
> > Performance counter stats for 'perf bench sched pipe' (10 runs):
> >
> > 16,219.58 msec task-clock:u # 1.130 CPUs utilized ( +- 0.80% )
> > 0 context-switches:u # 0.000 /sec
> > 0 cpu-migrations:u # 0.000 /sec
> > 1,391 page-faults:u # 85.163 /sec ( +- 0.06% )
> > 342,768,312 cycles:u # 0.021 GHz ( +- 0.63% ) (83.36%)
> > 66,231,208 stalled-cycles-frontend:u # 18.91% frontend cycles idle ( +- 2.34% ) (83.95%)
> > 39,055,410 stalled-cycles-backend:u # 11.15% backend cycles idle ( +- 1.80% ) (82.73%)
> > 84,475,662 instructions:u # 0.24 insn per cycle
> > # 0.82 stalled cycles per insn ( +- 0.02% ) (83.05%)
> > 34,067,160 branches:u # 2.086 M/sec ( +- 0.02% ) (83.67%)
> > 8,042,888 branch-misses:u # 23.60% of all branches ( +- 0.07% ) (83.25%)
> >
> > 14.358 +- 0.116 seconds time elapsed ( +- 0.81% )
>
> Noise caused by too many counters & the vagaries of multi-CPU scheduling is
> drowning out any results here.
>
> I'd suggest somethig like this to measure same-CPU context-switching
> overhead:
>
> taskset 1 perf stat --repeat 10 -e cycles,instructions,task-clock perf bench sched pipe
>
> ... and make sure the cpufreq governor is at 'performance' first:
performance governor won't stress the patch as the static key should bypass the
new code
>
> for ((cpu=0; cpu < $(nproc); cpu++)); do echo performance > /sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor; done
There's this short hand if you like
echo performance | sudo tee /sys/devices/system/cpu/cpufreq/policy*/scaling_governor
>
> With that approach you should much, much lower noise levels even with just
> 3 runs:
>
> Performance counter stats for 'perf bench sched pipe' (3 runs):
>
> 51,616,501,297 cycles # 3.188 GHz ( +- 0.05% )
> 37,523,641,203 instructions # 0.73 insn per cycle ( +- 0.08% )
> 16,191.01 msec task-clock # 0.999 CPUs utilized ( +- 0.04% )
>
> 16.20511 +- 0.00578 seconds time elapsed ( +- 0.04% )
Thanks for the tips!
I repeated the test using taskset and fewer counters for performance and
schedutil
tip: schedutil:
---------------
Performance counter stats for 'perf bench sched pipe' (10 runs):
829,076,881 cycles:u # 0.077 GHz ( +- 1.26% )
82,712,937 instructions:u # 0.10 insn per cycle ( +- 0.00% )
10,735.67 msec task-clock:u # 1.002 CPUs utilized ( +- 0.08% )
10.71758 +- 0.00840 seconds time elapsed ( +- 0.08% )
tip: performance:
-----------------
Performance counter stats for 'perf bench sched pipe' (10 runs):
871,744,951 cycles:u # 0.079 GHz ( +- 1.04% )
82,711,239 instructions:u # 0.10 insn per cycle ( +- 0.00% )
11,076.50 msec task-clock:u # 1.004 CPUs utilized ( +- 0.20% )
11.0374 +- 0.0216 seconds time elapsed ( +- 0.20% )
tip+patch: schedutil:
---------------------
Performance counter stats for 'perf bench sched pipe' (10 runs):
836,767,470 cycles:u # 0.078 GHz ( +- 0.69% )
82,712,893 instructions:u # 0.10 insn per cycle ( +- 0.00% )
10,825.83 msec task-clock:u # 1.005 CPUs utilized ( +- 0.12% )
10.7751 +- 0.0128 seconds time elapsed ( +- 0.12% )
tip+patch: performance:
-----------------------
Performance counter stats for 'perf bench sched pipe' (10 runs):
842,037,546 cycles:u # 0.077 GHz ( +- 0.97% )
82,717,942 instructions:u # 0.10 insn per cycle ( +- 0.00% )
10,921.37 msec task-clock:u # 0.996 CPUs utilized ( +- 0.18% )
10.9629 +- 0.0202 seconds time elapsed ( +- 0.18% )
Thanks!
--
Qais Yousef
Powered by blists - more mailing lists