[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<CYYPR12MB8655660B0C221CF594A100769CE02@CYYPR12MB8655.namprd12.prod.outlook.com>
Date: Sat, 11 May 2024 06:54:54 +0000
From: "Yuan, Perry" <Perry.Yuan@....com>
To: "Du, Xiaojian" <Xiaojian.Du@....com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-pm@...r.kernel.org"
<linux-pm@...r.kernel.org>
CC: "tglx@...utronix.de" <tglx@...utronix.de>, "mingo@...hat.com"
<mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "hpa@...or.com"
<hpa@...or.com>, "daniel.sneddon@...ux.intel.com"
<daniel.sneddon@...ux.intel.com>, "jpoimboe@...nel.org"
<jpoimboe@...nel.org>, "pawan.kumar.gupta@...ux.intel.com"
<pawan.kumar.gupta@...ux.intel.com>, "Das1, Sandipan" <Sandipan.Das@....com>,
"kai.huang@...el.com" <kai.huang@...el.com>, "x86@...nel.org"
<x86@...nel.org>, "Huang, Ray" <Ray.Huang@....com>, "rafael@...nel.org"
<rafael@...nel.org>, "Limonciello, Mario" <Mario.Limonciello@....com>
Subject: RE: [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay
for some models
[AMD Official Use Only - General]
> -----Original Message-----
> From: Du, Xiaojian <Xiaojian.Du@....com>
> Sent: Monday, April 29, 2024 3:03 PM
> To: linux-kernel@...r.kernel.org; linux-pm@...r.kernel.org
> Cc: tglx@...utronix.de; mingo@...hat.com; bp@...en8.de;
> dave.hansen@...ux.intel.com; hpa@...or.com;
> daniel.sneddon@...ux.intel.com; jpoimboe@...nel.org;
> pawan.kumar.gupta@...ux.intel.com; Das1, Sandipan
> <Sandipan.Das@....com>; kai.huang@...el.com; Yuan, Perry
> <Perry.Yuan@....com>; x86@...nel.org; Huang, Ray
> <Ray.Huang@....com>; rafael@...nel.org; Du, Xiaojian
> <Xiaojian.Du@....com>; Limonciello, Mario <Mario.Limonciello@....com>
> Subject: [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay
> for some models
>
> Some of AMD ZEN4 APU/CPU have support for adjusting the CPU core clock
> more quickly and presicely according to CPU work loading.
> This is advertised by the Fast CPPC x86 feature.
> This change will only be effective in the *passive mode* of AMD pstate
> driver. From the test results of different transition delay values, 600us is
> chosen to make a balance between performance and power consumption.
>
> Some test results on AMD Ryzen 7840HS(Phoenix) APU:
>
> 1. Tbench
> (Energy less is better, Throughput more is better, PPW--Performance per
> Watt more is better) ============= ===================
> ============== =============== ==============
> =============== ============== ===============
> ===============
> Trans Delay Tbench governor:schedutil, 3-iterations average
> ============= =================== ==============
> =============== ============== ===============
> ============== =============== ===============
> 1000us Clients 1 2 4 8 12 16
> 32
> Energy/Joules 2010 2804 8768 17171 16170
> 15132 15027
> Throughput/(MB/s) 114 259 1041 3010 3135
> 4851 4605
> PPW 0.0567 0.0923 0.1187 0.1752 0.1938
> 0.3205 0.3064
> 600us Clients 1 2 4 8 12 16 32
> Energy/Joules 2115 (5.22%) 2388 (-14.84%) 10700(22.03%) 16716
> (-2.65%) 15939 (-1.43%) 15053 (-0.52%) 15083 (0.37% )
> Throughput/(MB/s) 122 (7.02%) 234 (-9.65% ) 1188 (14.12%) 3003
> (-0.23%) 3143 (0.26% ) 4842 (-0.19%) 4603 (-0.04%)
> PPW 0.0576(1.59%) 0.0979(6.07% ) 0.111(-6.49%)
> 0.1796(2.51% ) 0.1971(1.70% ) 0.3216(0.34% ) 0.3051(-0.42%)
> ============= =================== ==============
> ================ ============= ===============
> ============== =============== ===============
>
> 2.Dbench
> (Energy less is better, Throughput more is better, PPW--Performance per
> Watt more is better) ============= ===================
> ============== =============== ==============
> =============== ============== ===============
> ===============
> Trans Delay Dbench governor:schedutil, 3-iterations average
> ============= =================== ==============
> =============== ============== ===============
> ============== =============== ===============
> 1000us Clients 1 2 4 8 12 16
> 32
> Energy/Joules 4890 3779 3567 5157 5611
> 6500 8163
> Throughput/(MB/s) 327 167 220 577 775
> 938 1397
> PPW 0.0668 0.0441 0.0616 0.1118 0.1381
> 0.1443 0.1711
> 600us Clients 1 2 4 8 12 16 32
> Energy/Joules 4915 (0.51%) 4912 (29.98%) 3506 (-1.71%) 4907 (-
> 4.85% ) 5011 (-10.69%) 5672 (-12.74%) 8141 (-0.27%)
> Throughput/(MB/s) 348 (6.42%) 284 (70.06%) 220 (000% ) 518 (-
> 10.23%) 712 (-8.13% ) 854 (-8.96% ) 1475 (5.58% )
> PPW 0.0708(5.99%) 0.0578(31.07%) 0.0627(179% ) 0.1055(-
> 5.64% ) 0.142(2.82% ) 0.1505(4.30% ) 0.1811(5.84% )
> ============= =================== ==============
> =============== ============== ===============
> ============== =============== ===============
>
> 3.Hackbench(less time is better)
> ============= ===========================
> ==========================
> hackbench governor:schedutil
> ============= ===========================
> ==========================
> Trans Delay Process Mode Ave time(s) Thread Mode Ave time(s)
> 1000us 14.484 14.484
> 600us 14.418(-0.46%) 15.41(+6.39%)
> ============= ===========================
> ==========================
>
> 4.Perf_sched_bench(less time is better)
> ============= =================== ==============
> ============== ============== ===============
> =============== =============
> Trans Delay perf_sched_bench governor:schedutil
> ============= =================== ==============
> ============== ============== ===============
> =============== =============
> 1000us Groups 1 2 4 8 12 24
> AveTime(s) 1.64 2.851 5.878 11.636 16.093
> 26.395
> 600us Groups 1 2 4 8 12 24
> AveTime(s) 1.69(3.05%) 2.845(-0.21%) 5.843(-0.60%) 11.576(-
> 0.52%) 16.092(-0.01%) 26.32(-0.28%)
> ============= ================== ==============
> ============== ============== ===============
> =============== ==============
>
> 5.Sysbench(higher is better)
> ============= ================== ==============
> ================= ============== ================
> =============== =================
> Sysbench governor:schedutil
> ============= ================== ==============
> ================= ============== ================
> =============== =================
> 1000us Thread 1 2 4 8 12 24
> Ave events 6020.98 12273.39 24119.82 46171.57
> 47074.37 47831.72
> 600us Thread 1 2 4 8 12 24
> Ave events 6154.82(2.22%) 12271.63(-0.01%) 24392.5(1.13%)
> 46117.64(-0.12%) 46852.19(-0.47%) 47678.92(-0.32%)
> ============= ================== ==============
> ================= ============== ================
> =============== =================
>
> In conclusion, a shorter transition delay of cpu clock will make a quite positive
> effect to improve PPW on Dbench test, in the meanwhile , keep stable
> performance on Tbench, Hackbench, Perf_sched_bench and Sysbench.
>
> Signed-off-by: Xiaojian Du <Xiaojian.Du@....com>
> Reviewed-by: Mario Limonciello <mario.limonciello@....com>
> ---
> drivers/cpufreq/amd-pstate.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 2015c9fcc3c9..8c8594f67af6 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -50,6 +50,7 @@
>
> #define AMD_PSTATE_TRANSITION_LATENCY 20000
> #define AMD_PSTATE_TRANSITION_DELAY 1000
> +#define AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY 600
> #define AMD_PSTATE_PREFCORE_THRESHOLD 166
>
> /*
> @@ -868,7 +869,11 @@ static int amd_pstate_cpu_init(struct cpufreq_policy
> *policy)
> }
>
> policy->cpuinfo.transition_latency =
> AMD_PSTATE_TRANSITION_LATENCY;
> - policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
> +
> + if (cpu_feature_enabled(X86_FEATURE_FAST_CPPC))
> + policy->transition_delay_us =
> AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY;
> + else
> + policy->transition_delay_us =
> AMD_PSTATE_TRANSITION_DELAY;
>
> policy->min = min_freq;
> policy->max = max_freq;
> --
> 2.34.1
LGTM
Reviewed-by: Perry Yuan <perry.yuan@....com>
Powered by blists - more mailing lists