linux-kernel - Re: [RFC PATCH] sched: Consolidate cpufreq updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240328215556.afaynyoldoizhcpr@airbuntu>
Date: Thu, 28 Mar 2024 21:55:56 +0000
From: Qais Yousef <qyousef@...alina.io>
To: Ingo Molnar <mingo@...nel.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Daniel Bristot de Oliveira <bristot@...hat.com>,
	Valentin Schneider <vschneid@...hat.com>,
	Christian Loehle <christian.loehle@....com>,
	linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched: Consolidate cpufreq updates

On 03/26/24 09:20, Ingo Molnar wrote:
> 
> * Qais Yousef <qyousef@...alina.io> wrote:
> 
> > Results of `perf stat --repeat 10 perf bench sched pipe` on AMD 3900X to
> > verify any potential overhead because of the addition at context switch
> > 
> > Before:
> > -------
> > 
> > 	Performance counter stats for 'perf bench sched pipe' (10 runs):
> > 
> > 		 16,839.74 msec task-clock:u              #    1.158 CPUs utilized            ( +-  0.52% )
> > 			 0      context-switches:u        #    0.000 /sec
> > 			 0      cpu-migrations:u          #    0.000 /sec
> > 		     1,390      page-faults:u             #   83.903 /sec                     ( +-  0.06% )
> > 	       333,773,107      cycles:u                  #    0.020 GHz                      ( +-  0.70% )  (83.72%)
> > 		67,050,466      stalled-cycles-frontend:u #   19.94% frontend cycles idle     ( +-  2.99% )  (83.23%)
> > 		37,763,775      stalled-cycles-backend:u  #   11.23% backend cycles idle      ( +-  2.18% )  (83.09%)
> > 		84,456,137      instructions:u            #    0.25  insn per cycle
> > 							  #    0.83  stalled cycles per insn  ( +-  0.02% )  (83.01%)
> > 		34,097,544      branches:u                #    2.058 M/sec                    ( +-  0.02% )  (83.52%)
> > 		 8,038,902      branch-misses:u           #   23.59% of all branches          ( +-  0.03% )  (83.44%)
> > 
> > 		   14.5464 +- 0.0758 seconds time elapsed  ( +-  0.52% )
> > 
> > After:
> > -------
> > 
> > 	Performance counter stats for 'perf bench sched pipe' (10 runs):
> > 
> > 		 16,219.58 msec task-clock:u              #    1.130 CPUs utilized            ( +-  0.80% )
> > 			 0      context-switches:u        #    0.000 /sec
> > 			 0      cpu-migrations:u          #    0.000 /sec
> > 		     1,391      page-faults:u             #   85.163 /sec                     ( +-  0.06% )
> > 	       342,768,312      cycles:u                  #    0.021 GHz                      ( +-  0.63% )  (83.36%)
> > 		66,231,208      stalled-cycles-frontend:u #   18.91% frontend cycles idle     ( +-  2.34% )  (83.95%)
> > 		39,055,410      stalled-cycles-backend:u  #   11.15% backend cycles idle      ( +-  1.80% )  (82.73%)
> > 		84,475,662      instructions:u            #    0.24  insn per cycle
> > 							  #    0.82  stalled cycles per insn  ( +-  0.02% )  (83.05%)
> > 		34,067,160      branches:u                #    2.086 M/sec                    ( +-  0.02% )  (83.67%)
> > 		 8,042,888      branch-misses:u           #   23.60% of all branches          ( +-  0.07% )  (83.25%)
> > 
> > 		    14.358 +- 0.116 seconds time elapsed  ( +-  0.81% )
> 
> Noise caused by too many counters & the vagaries of multi-CPU scheduling is 
> drowning out any results here.
> 
> I'd suggest somethig like this to measure same-CPU context-switching 
> overhead:
> 
>     taskset 1 perf stat --repeat 10 -e cycles,instructions,task-clock perf bench sched pipe
> 
> ... and make sure the cpufreq governor is at 'performance' first:

performance governor won't stress the patch as the static key should bypass the
new code

> 
>     for ((cpu=0; cpu < $(nproc); cpu++)); do echo performance > /sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor; done

There's this short hand if you like

	echo performance | sudo tee /sys/devices/system/cpu/cpufreq/policy*/scaling_governor

> 
> With that approach you should much, much lower noise levels even with just 
> 3 runs:
> 
>  Performance counter stats for 'perf bench sched pipe' (3 runs):
> 
>     51,616,501,297      cycles                           #    3.188 GHz                         ( +-  0.05% )
>     37,523,641,203      instructions                     #    0.73  insn per cycle              ( +-  0.08% )
>          16,191.01 msec task-clock                       #    0.999 CPUs utilized               ( +-  0.04% )
> 
>           16.20511 +- 0.00578 seconds time elapsed  ( +-  0.04% )

Thanks for the tips!

I repeated the test using taskset and fewer counters for performance and
schedutil


tip: schedutil:
---------------

 Performance counter stats for 'perf bench sched pipe' (10 runs):

       829,076,881      cycles:u                  #    0.077 GHz                      ( +-  1.26% )
        82,712,937      instructions:u            #    0.10  insn per cycle           ( +-  0.00% )
         10,735.67 msec task-clock:u              #    1.002 CPUs utilized            ( +-  0.08% )

          10.71758 +- 0.00840 seconds time elapsed  ( +-  0.08% )

tip: performance:
-----------------

 Performance counter stats for 'perf bench sched pipe' (10 runs):

       871,744,951      cycles:u                  #    0.079 GHz                      ( +-  1.04% )
        82,711,239      instructions:u            #    0.10  insn per cycle           ( +-  0.00% )
         11,076.50 msec task-clock:u              #    1.004 CPUs utilized            ( +-  0.20% )

           11.0374 +- 0.0216 seconds time elapsed  ( +-  0.20% )

tip+patch: schedutil:
---------------------

 Performance counter stats for 'perf bench sched pipe' (10 runs):

       836,767,470      cycles:u                  #    0.078 GHz                      ( +-  0.69% )
        82,712,893      instructions:u            #    0.10  insn per cycle           ( +-  0.00% )
         10,825.83 msec task-clock:u              #    1.005 CPUs utilized            ( +-  0.12% )

           10.7751 +- 0.0128 seconds time elapsed  ( +-  0.12% )

tip+patch: performance:
-----------------------

 Performance counter stats for 'perf bench sched pipe' (10 runs):

       842,037,546      cycles:u                  #    0.077 GHz                      ( +-  0.97% )
        82,717,942      instructions:u            #    0.10  insn per cycle           ( +-  0.00% )
         10,921.37 msec task-clock:u              #    0.996 CPUs utilized            ( +-  0.18% )

           10.9629 +- 0.0202 seconds time elapsed  ( +-  0.18% )


Thanks!

--
Qais Yousef