linux-kernel - Re: [PATCH v7] sched: Consolidate cpufreq updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a1cac0d3-c17d-478e-8a6b-40399a9428b6@linux.ibm.com>
Date: Sat, 19 Oct 2024 00:02:10 +0530
From: Anjali K <anjalik@...ux.ibm.com>
To: Christian Loehle <christian.loehle@....com>,
        Qais Yousef <qyousef@...alina.io>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>, Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ben Segall
 <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Valentin Schneider <vschneid@...hat.com>,
        Hongyan Xia
 <hongyan.xia2@....com>, John Stultz <jstultz@...gle.com>,
        linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7] sched: Consolidate cpufreq updates

On 08/10/24 15:26, Christian Loehle wrote:
> The default CPUFREQ_DBS_MIN_SAMPLING_INTERVAL is still to have 2 ticks between
> cpufreq updates on conservative/ondemand.
> What is your sampling_rate setting? What's your HZ?
The sampling_rate setting is 8000 us.
CONFIG_HZ is set to 250 Hz.
> Interestingly the context switch heavy benchmarks still show -6% don't they?
Yes, stress-ng and Unixbench Pipebased Context Switching benchmarks showed 6% regression. There was a high run-to-run variation in stress-ng and the Unixbench Pipebased Context Switching benchmarks of 15% and 5% respectively. This led me to doubt those results and so I re-ran these two benchmarks.

Each run below is an average of 10 iterations of the benchmarks.
The results are as follows:
+------------------------------------------------------+--------------------+----------+--------+---------+----------------+
|                     Benchmark                        |      Baseline      | Baseline +   |Baseline|Baseline | Throughput |
|                                                      |  (6.10.0-rc1 tip   |    patch     |        |+ patch  |Difference %|
|                                                      |  sched/core)       |              |stdev % | stdev % |            |
|                                                      |  avg throughput    |avg throughput|        |         |            |    
+------------------------------------------------------+--------------------+--------------+--------+---------+------------+
|Unixbench Pipebased Context Switching throughput (lps)|         1          |     1.02     |   6.48 |  10.29  |    2.18    |
|                                                      |         1          |     1.19     |  13.74 |   8.22  |   19.20    |
|                                                      |         1          |     0.87     |  11.27 |   8.12  |  -13.24    |
|                                                      |                    |              |        |         |            |
|stressng (bogo ops)                                   |         1          |     1.01     |  2.68  |  1.90   |    1.35    |
|                                                      |         1          |     0.98     |  2.29  |  4.26   |   -2.03    |
|                                                      |         1          |     0.99     |  2.01  |  2.24   |   -0.56    |
+------------------------------------------------------+--------------------+--------------+--------+---------+------------+                   
There is a very high run-to-run variation in the Unixbench Pipebased Context
Switching benchmark and we can't conclude anything from this benchmark.
There is no regression in stress-ng on applying this patch on this system.

> Do you mind trying schedutil with a reasonable rate_limit_us, too?

I think the schedutil governor is not working on my system because the cpu
frequency shoots to the maximum (3.9GHz) even when the system is only 10%
loaded.
I ran stress-ng --cpu `nproc` --cpu-load 10.
The mpstat command shows that the system is 10% loaded:
10:55:25 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
10:56:50 AM  all   10.03    0.00    0.02    0.00    0.18    0.00    0.00    0.00    0.00   89.76
But cpupower frequency-info showed that the system is at max frequency
root@...zz10:~# cpupower frequency-info
<snipped>
  available cpufreq governors: conservative ondemand performance schedutil
  current policy: frequency should be within 2.30 GHz and 3.90 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency: 3.90 GHz (asserted by call to hardware)
<snipped>
This is not expected, right?
I will work on finding out why the schedutil governor is not working on
this system and get back.

Thank you for your response,
Anjali K