lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y6VVr+WYqwWb6XV0@amd.com>
Date:   Fri, 23 Dec 2022 15:15:59 +0800
From:   Huang Rui <ray.huang@....com>
To:     "Yuan, Perry" <Perry.Yuan@....com>
Cc:     "Huang, Shimmer" <Shimmer.Huang@....com>,
        "Limonciello, Mario" <Mario.Limonciello@....com>,
        "Liang, Richard qi" <Richardqi.Liang@....com>,
        "rafael.j.wysocki@...el.com" <rafael.j.wysocki@...el.com>,
        "viresh.kumar@...aro.org" <viresh.kumar@...aro.org>,
        "Sharma, Deepak" <Deepak.Sharma@....com>,
        "Fontenot, Nathan" <Nathan.Fontenot@....com>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        "Du, Xiaojian" <Xiaojian.Du@....com>,
        "Meng, Li (Jassmine)" <Li.Meng@....com>,
        "Karny, Wyes" <Wyes.Karny@....com>,
        "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP
 support for the AMD processors

On Mon, Dec 19, 2022 at 06:21:14PM +0800, Yuan, Perry wrote:
> [AMD Official Use Only - General]
> 
> HI ray. 
> 
> > -----Original Message-----
> > From: Huang, Ray <Ray.Huang@....com>
> > Sent: Monday, December 12, 2022 4:47 PM
> > To: Yuan, Perry <Perry.Yuan@....com>
> > Cc: rafael.j.wysocki@...el.com; Limonciello, Mario
> > <Mario.Limonciello@....com>; viresh.kumar@...aro.org; Sharma, Deepak
> > <Deepak.Sharma@....com>; Fontenot, Nathan
> > <Nathan.Fontenot@....com>; Deucher, Alexander
> > <Alexander.Deucher@....com>; Huang, Shimmer
> > <Shimmer.Huang@....com>; Du, Xiaojian <Xiaojian.Du@....com>; Meng,
> > Li (Jassmine) <Li.Meng@....com>; Karny, Wyes <Wyes.Karny@....com>;
> > linux-pm@...r.kernel.org; linux-kernel@...r.kernel.org
> > Subject: Re: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP
> > support for the AMD processors
> > 
> > On Thu, Dec 08, 2022 at 07:18:44PM +0800, Yuan, Perry wrote:
> > > From: Perry Yuan <Perry.Yuan@....com>
> > >
> > > Add EPP driver support for AMD SoCs which support a dedicated MSR for
> > > CPPC.  EPP is used by the DPM controller to configure the frequency
> > > that a core operates at during short periods of activity.
> > >
> > > The SoC EPP targets are configured on a scale from 0 to 255 where 0
> > > represents maximum performance and 255 represents maximum efficiency.
> > >
> > > The amd-pstate driver exports profile string names to userspace that
> > > are tied to specific EPP values.
> > >
> > > The balance_performance string (0x80) provides the best balance for
> > > efficiency versus power on most systems, but users can choose other
> > > strings to meet their needs as well.
> > >
> > > $ cat
> > >
> > /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available_p
> > > references default performance balance_performance balance_power
> > power
> > >
> > > $ cat
> > >
> > /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference
> > > balance_performance
> > >
> > > To enable the driver,it needs to add `amd_pstate=active` to kernel
> > > command line and kernel will load the active mode epp driver
> > >
> > > Signed-off-by: Perry Yuan <Perry.Yuan@....com>
> > > ---
> > >  drivers/cpufreq/amd-pstate.c | 631
> > ++++++++++++++++++++++++++++++++++-
> > >  include/linux/amd-pstate.h   |  35 ++
> > >  2 files changed, 660 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/cpufreq/amd-pstate.c
> > > b/drivers/cpufreq/amd-pstate.c index c17bd845f5fc..0a521be1be8a
> > 100644
> > > --- a/drivers/cpufreq/amd-pstate.c
> > > +++ b/drivers/cpufreq/amd-pstate.c
> > > @@ -37,6 +37,7 @@
> > >  #include <linux/uaccess.h>
> > >  #include <linux/static_call.h>
> > >  #include <linux/amd-pstate.h>
> > > +#include <linux/cpufreq_common.h>
> > >
> > >  #include <acpi/processor.h>
> > >  #include <acpi/cppc_acpi.h>
> > > @@ -59,9 +60,125 @@
> > >   * we disable it by default to go acpi-cpufreq on these processors and add
> > a
> > >   * module parameter to be able to enable it manually for debugging.
> > >   */
> > > -static struct cpufreq_driver amd_pstate_driver;
> > > +static bool cppc_active;
> > >  static int cppc_load __initdata;
> > >
> > > +static struct cpufreq_driver *default_pstate_driver; static struct
> > > +amd_cpudata **all_cpu_data; static struct amd_pstate_params
> > > +global_params;
> > > +
> > > +static DEFINE_MUTEX(amd_pstate_limits_lock);
> > > +static DEFINE_MUTEX(amd_pstate_driver_lock);
> > > +
> > > +static bool cppc_boost __read_mostly;
> > > +
> > > +static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64
> > > +cppc_req_cached) {
> > > +	s16 epp;
> > > +	struct cppc_perf_caps perf_caps;
> > > +	int ret;
> > > +
> > > +	if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > +		if (!cppc_req_cached) {
> > > +			epp = rdmsrl_on_cpu(cpudata->cpu,
> > MSR_AMD_CPPC_REQ,
> > > +					&cppc_req_cached);
> > > +			if (epp)
> > > +				return epp;
> > > +		}
> > > +		epp = (cppc_req_cached >> 24) & 0xFF;
> > > +	} else {
> > > +		ret = cppc_get_epp_caps(cpudata->cpu, &perf_caps);
> > > +		if (ret < 0) {
> > > +			pr_debug("Could not retrieve energy perf value
> > (%d)\n", ret);
> > > +			return -EIO;
> > > +		}
> > > +		epp = (s16) perf_caps.energy_perf;
> > 
> > It should align the static_call structure to implement the function. Please
> > refer amd_pstate_init_perf. I think it can even re-use the init_perf to get the
> > epp cap value.
> 
> The  amd_pstate_init_perf() is only called when driver registering,
> However the amd_pstate_get_epp() will be called frequently to update the EPP MSR value and EPP Min/Max  limitation. 
> So I suggest to keep using the amd_pstate_get_epp() to update EPP related values as it is.
> 
> Static_call method can do that for MSR and Shared memory  API  call, but amd_pstate_get_epp() is simple  enough for now. No need to make this 
> Epp update function also using static call method. 

Using static calls are to avoid retpoline, not to make them simple.

https://thenewstack.io/linux-kernel-5-10-introduces-static-calls-to-prevent-speculative-execution-attacks/

> Considering the tight schedule and merge window, I would like to keep the current way to update EPP, Otherwise the Customer release schedule will be delayed. 
> 

Mailing list is not the place to talk about internal schedule.

> Perry.
> 
> 
> > 
> > > +	}
> > > +
> > > +	return epp;
> > > +}
> > > +
> > > +static int amd_pstate_get_energy_pref_index(struct amd_cpudata
> > > +*cpudata) {
> > > +	s16 epp;
> > > +	int index = -EINVAL;
> > > +
> > > +	epp = amd_pstate_get_epp(cpudata, 0);
> > > +	if (epp < 0)
> > > +		return epp;
> > > +
> > > +	switch (epp) {
> > > +	case HWP_EPP_PERFORMANCE:
> > > +		index = EPP_INDEX_PERFORMANCE;
> > > +		break;
> > > +	case HWP_EPP_BALANCE_PERFORMANCE:
> > > +		index = EPP_INDEX_BALANCE_PERFORMANCE;
> > > +		break;
> > > +	case HWP_EPP_BALANCE_POWERSAVE:
> > > +		index = EPP_INDEX_BALANCE_POWERSAVE;
> > > +		break;
> > > +	case HWP_EPP_POWERSAVE:
> > > +		index = EPP_INDEX_POWERSAVE;
> > > +		break;
> > > +	default:
> > > +			break;
> > > +	}
> > > +
> > > +	return index;
> > > +}
> > > +
> > > +static int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32 epp) {
> > > +	int ret;
> > > +	struct cppc_perf_ctrls perf_ctrls;
> > > +
> > > +	if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > +		u64 value = READ_ONCE(cpudata->cppc_req_cached);
> > > +
> > > +		value &= ~GENMASK_ULL(31, 24);
> > > +		value |= (u64)epp << 24;
> > > +		WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > +
> > > +		ret = wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> > value);
> > > +		if (!ret)
> > > +			cpudata->epp_cached = epp;
> > > +	} else {
> > > +		perf_ctrls.energy_perf = epp;
> > > +		ret = cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
> > 
> > Since the energy_perf is one of members of struct cppc_perf_ctrls, could we
> > use cppc_set_perf as well?
> 
> cppc_set_epp_perf() can handle the EPP value update correctly, 
> cppc_set_perf() is used for desired perf updating with a very high updating rate for governor such as schedutil governor.
> And it has two Phase, Phase I and Phase II,  implement the EPP value update in this function, I  have concern that we will meet some potential 
> Firmware or performance drop risk. 

I am fine to use the separated cppc_set_epp_perf.

> The release schedule and merge window is closing for v6.2 and this change request happened after six  patch review series.
> I afraid of that we have no enough time to mitigate the risk for this new code change.
> We can consider to continue optimize this in the following patch.
> 
> Perry.
> 
> > 
> > > +		if (ret) {
> > > +			pr_debug("failed to set energy perf value (%d)\n",
> > ret);
> > > +			return ret;
> > > +		}
> > > +		cpudata->epp_cached = epp;
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > 
> > The same with above, the helpers for different cppc types of processors such
> > as MSR or share memory should be implemented by static_call.
> > 
> > > +
> > > +static int amd_pstate_set_energy_pref_index(struct amd_cpudata
> > *cpudata,
> > > +		int pref_index)
> > > +{
> > > +	int epp = -EINVAL;
> > > +	int ret;
> > > +
> > > +	if (!pref_index) {
> > > +		pr_debug("EPP pref_index is invalid\n");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (epp == -EINVAL)
> > > +		epp = epp_values[pref_index];
> > > +
> > > +	if (epp > 0 && cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> > {
> > > +		pr_debug("EPP cannot be set under performance policy\n");
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	ret = amd_pstate_set_epp(cpudata, epp);
> > > +
> > > +	return ret;
> > > +}
> > > +
> > >  static inline int pstate_enable(bool enable)  {
> > >  	return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable); @@ -70,11
> > +187,21
> > > @@ static inline int pstate_enable(bool enable)  static int
> > > cppc_enable(bool enable)  {
> > >  	int cpu, ret = 0;
> > > +	struct cppc_perf_ctrls perf_ctrls;
> > >
> > >  	for_each_present_cpu(cpu) {
> > >  		ret = cppc_set_enable(cpu, enable);
> > >  		if (ret)
> > >  			return ret;
> > > +
> > > +		/* Enable autonomous mode for EPP */
> > > +		if (!cppc_active) {
> > > +			/* Set desired perf as zero to allow EPP firmware
> > control */
> > > +			perf_ctrls.desired_perf = 0;
> > > +			ret = cppc_set_perf(cpu, &perf_ctrls);
> > > +			if (ret)
> > > +				return ret;
> > > +		}
> > >  	}
> > >
> > >  	return ret;
> > > @@ -418,7 +545,7 @@ static void amd_pstate_boost_init(struct
> > amd_cpudata *cpudata)
> > >  		return;
> > >
> > >  	cpudata->boost_supported = true;
> > > -	amd_pstate_driver.boost_enabled = true;
> > > +	default_pstate_driver->boost_enabled = true;
> > >  }
> > >
> > >  static void amd_perf_ctl_reset(unsigned int cpu) @@ -592,10 +719,61
> > > @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy
> > *policy,
> > >  	return sprintf(&buf[0], "%u\n", perf);  }
> > >
> > > +static ssize_t show_energy_performance_available_preferences(
> > > +				struct cpufreq_policy *policy, char *buf) {
> > > +	int i = 0;
> > > +	int offset = 0;
> > > +
> > > +	while (energy_perf_strings[i] != NULL)
> > > +		offset += sysfs_emit_at(buf, offset, "%s ",
> > > +energy_perf_strings[i++]);
> > > +
> > > +	sysfs_emit_at(buf, offset, "\n");
> > > +
> > > +	return offset;
> > > +}
> > > +
> > > +static ssize_t store_energy_performance_preference(
> > > +		struct cpufreq_policy *policy, const char *buf, size_t count) {
> > > +	struct amd_cpudata *cpudata = policy->driver_data;
> > > +	char str_preference[21];
> > > +	ssize_t ret;
> > > +
> > > +	ret = sscanf(buf, "%20s", str_preference);
> > > +	if (ret != 1)
> > > +		return -EINVAL;
> > > +
> > > +	ret = match_string(energy_perf_strings, -1, str_preference);
> > > +	if (ret < 0)
> > > +		return -EINVAL;
> > > +
> > > +	mutex_lock(&amd_pstate_limits_lock);
> > > +	ret = amd_pstate_set_energy_pref_index(cpudata, ret);
> > > +	mutex_unlock(&amd_pstate_limits_lock);
> > > +
> > > +	return ret ?: count;
> > > +}
> > > +
> > > +static ssize_t show_energy_performance_preference(
> > > +				struct cpufreq_policy *policy, char *buf) {
> > > +	struct amd_cpudata *cpudata = policy->driver_data;
> > > +	int preference;
> > > +
> > > +	preference = amd_pstate_get_energy_pref_index(cpudata);
> > > +	if (preference < 0)
> > > +		return preference;
> > > +
> > > +	return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]); }
> > > +
> > >  cpufreq_freq_attr_ro(amd_pstate_max_freq);
> > >  cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
> > >
> > >  cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> > > +cpufreq_freq_attr_rw(energy_performance_preference);
> > > +cpufreq_freq_attr_ro(energy_performance_available_preferences);
> > >
> > >  static struct freq_attr *amd_pstate_attr[] = {
> > >  	&amd_pstate_max_freq,
> > > @@ -604,6 +782,424 @@ static struct freq_attr *amd_pstate_attr[] = {
> > >  	NULL,
> > >  };
> > >
> > > +static struct freq_attr *amd_pstate_epp_attr[] = {
> > > +	&amd_pstate_max_freq,
> > > +	&amd_pstate_lowest_nonlinear_freq,
> > > +	&amd_pstate_highest_perf,
> > > +	&energy_performance_preference,
> > > +	&energy_performance_available_preferences,
> > > +	NULL,
> > > +};
> > > +
> > > +static inline void update_boost_state(void) {
> > > +	u64 misc_en;
> > > +	struct amd_cpudata *cpudata;
> > > +
> > > +	cpudata = all_cpu_data[0];
> > > +	rdmsrl(MSR_K7_HWCR, misc_en);
> > > +	global_params.cppc_boost_disabled = misc_en & BIT_ULL(25);
> > 
> > I won't need introduce the additional cppc_boost_disabled here. The
> > cpufreq_driver->boost_enabled and cpudata->boost_supported can manage
> > this function.
> 
> The cppc_boost_disabled is used to mark if the PMFW Core boost disabled, 
> If some other driver for example thermal, performance limiting driver disable the core  boost.

I didn't see any other driver to control MSR_K7_HWCR_CPB_DIS except
acpi-cpufreq.

> We need to update the flag to let driver know the boost is disabled.  
> 
> * boost_supported is used to change CORE freq boost  state. 
> * EPP driver did not use the cpufreq core boost sysfs node.  So the boost_enabled is not used here. 

I would like to clarify again the core boost state is for legacy ACPI
P-State, and it's configured by MSR_K7_HWCR. The CPPC is using the highest
perf to map the boost frequency. However, here, it's because of some
hardware/firmware issues or quirks that the legacy boost setting still
impacts the target frequency. So cppc_boost will confuse the user for the
functionalities between CPPC and ACPI P-State.

Can we enable core_boost configuration by default if using amd-pstate?

> 
> > 
> > I believe it should be the firmware issue that the legacy ACPI Boost state will
> > impact the frequency of CPPC. Could you move this handling into the
> > cpufreq_driver->set_boost callback function to enable boost state by default.
> > 
> > > +}
> > > +
> > > +static bool amd_pstate_acpi_pm_profile_server(void)
> > > +{
> > > +	if (acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER ||
> > > +	    acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER)
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +static int amd_pstate_init_cpu(unsigned int cpunum) {
> > > +	struct amd_cpudata *cpudata;
> > > +
> > > +	cpudata = all_cpu_data[cpunum];
> > > +	if (!cpudata) {
> > > +		cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
> > > +		if (!cpudata)
> > > +			return -ENOMEM;
> > > +		WRITE_ONCE(all_cpu_data[cpunum], cpudata);
> > > +
> > > +		cpudata->cpu = cpunum;
> > > +
> > > +		if (cppc_active) {
> > 
> > The cppc_active is a bit confused here, if we run into amd-pstate driver, the
> > cppc should be active. I know you want to indicate the different driver mode
> > you are running. Please use an enumeration type to mark it different mode
> > such as PASSIVE_MODE, ACTIVE_MODE, and GUIDED_MODE (Wyes
> > proposed).
> 
> Aligned with Wyse, I add one new patch to support enumerated working mode in V8
> 
> 
> > 
> > > +			if (amd_pstate_acpi_pm_profile_server())
> > > +				cppc_boost = true;
> > > +		}
> > > +
> > > +	}
> > > +	cpudata->epp_powersave = -EINVAL;
> > > +	cpudata->epp_policy = 0;
> > > +	pr_debug("controlling: cpu %d\n", cpunum);
> > > +	return 0;
> > > +}
> > > +
> > > +static int __amd_pstate_cpu_init(struct cpufreq_policy *policy) {
> > > +	int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> > > +	struct amd_cpudata *cpudata;
> > > +	struct device *dev;
> > > +	int rc;
> > > +	u64 value;
> > > +
> > > +	rc = amd_pstate_init_cpu(policy->cpu);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	cpudata = all_cpu_data[policy->cpu];
> > > +
> > > +	dev = get_cpu_device(policy->cpu);
> > > +	if (!dev)
> > > +		goto free_cpudata1;
> > > +
> > > +	rc = amd_pstate_init_perf(cpudata);
> > > +	if (rc)
> > > +		goto free_cpudata1;
> > > +
> > > +	min_freq = amd_get_min_freq(cpudata);
> > > +	max_freq = amd_get_max_freq(cpudata);
> > > +	nominal_freq = amd_get_nominal_freq(cpudata);
> > > +	lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
> > > +	if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
> > > +		dev_err(dev, "min_freq(%d) or max_freq(%d) value is
> > incorrect\n",
> > > +				min_freq, max_freq);
> > > +		ret = -EINVAL;
> > > +		goto free_cpudata1;
> > > +	}
> > > +
> > > +	policy->min = min_freq;
> > > +	policy->max = max_freq;
> > > +
> > > +	policy->cpuinfo.min_freq = min_freq;
> > > +	policy->cpuinfo.max_freq = max_freq;
> > > +	/* It will be updated by governor */
> > > +	policy->cur = policy->cpuinfo.min_freq;
> > > +
> > > +	/* Initial processor data capability frequencies */
> > > +	cpudata->max_freq = max_freq;
> > > +	cpudata->min_freq = min_freq;
> > > +	cpudata->nominal_freq = nominal_freq;
> > > +	cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
> > > +
> > > +	policy->driver_data = cpudata;
> > > +
> > > +	update_boost_state();
> > > +	cpudata->epp_cached = amd_pstate_get_epp(cpudata, value);
> > > +
> > > +	policy->min = policy->cpuinfo.min_freq;
> > > +	policy->max = policy->cpuinfo.max_freq;
> > > +
> > > +	if (boot_cpu_has(X86_FEATURE_CPPC))
> > > +		policy->fast_switch_possible = true;
> > 
> > Please move this line into below if-case.
> 
> 
> Fixed In V8
> 
> > 
> > > +
> > > +	if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > +		ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> > &value);
> > > +		if (ret)
> > > +			return ret;
> > > +		WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > +
> > > +		ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
> > &value);
> > > +		if (ret)
> > > +			return ret;
> > > +		WRITE_ONCE(cpudata->cppc_cap1_cached, value);
> > > +	}
> > > +	amd_pstate_boost_init(cpudata);
> > > +
> > > +	return 0;
> > > +
> > > +free_cpudata1:
> > > +	kfree(cpudata);
> > > +	return ret;
> > > +}
> > > +
> > > +static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) {
> > > +	int ret;
> > > +
> > > +	ret = __amd_pstate_cpu_init(policy);
> > 
> > I don't see any reason that we need to define a __amd_pstate_cpu_init()
> > here. Intel P-State driver's __intel_pstate_cpu_init() is used both on
> > intel_pstate_cpu_init and intel_cpufreq_cpu_init.
> 
> Fixed in V8.
> 
> > 
> > > +	if (ret)
> > > +		return ret;
> > > +	/*
> > > +	 * Set the policy to powersave to provide a valid fallback value in
> > case
> > > +	 * the default cpufreq governor is neither powersave nor
> > performance.
> > > +	 */
> > > +	policy->policy = CPUFREQ_POLICY_POWERSAVE;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int amd_pstate_epp_cpu_exit(struct cpufreq_policy *policy) {
> > > +	pr_debug("CPU %d exiting\n", policy->cpu);
> > > +	policy->fast_switch_possible = false;
> > > +	return 0;
> > > +}
> > > +
> > > +static void amd_pstate_update_max_freq(unsigned int cpu)
> > 
> > Why do you name this function "update max frequency"?
> > 
> > We won't have the differnt cpudata->pstate.max_freq and
> > cpudata->pstate.turbo_freq on Intel P-State driver.
> > 
> > I think in fact we don't update anything here.
> 
> When core frequency was disabled, the function will update the frequency limits. 
> Currently the boost sysfs is not added, so the max freq is not changed.
> Could we keep the code for the coming patch to add the sysfs node for boost control ?
> It has no harm to the EPP driver. 

Again, the boost frequency and state is for ACPI P-State, we cannot make it
confused in the CPPC.

I didn't see where amd-pstate updates frequency max/min (limit), why we
keep the redundant codes here?

> 
> > 
> > > +{
> > > +	struct cpufreq_policy *policy = policy = cpufreq_cpu_get(cpu);
> > 
> > struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > 
> > > +
> > > +	if (!policy)
> > > +		return;
> > > +
> > > +	refresh_frequency_limits(policy);
> > > +	cpufreq_cpu_put(policy);
> > > +}
> > > +
> > > +static void amd_pstate_epp_update_limits(unsigned int cpu) {
> > > +	mutex_lock(&amd_pstate_driver_lock);
> > > +	update_boost_state();
> > > +	if (global_params.cppc_boost_disabled) {
> > > +		for_each_possible_cpu(cpu)
> > > +			amd_pstate_update_max_freq(cpu);
> > 
> > This should do nothing in the amd-pstate.
> 
> Currently the boost sysfs is not added, so the max freq is not changed.
> Could we keep the code for the coming patch to add the sysfs node for boost control ?
> It has no harm to the EPP driver.
> 
> > 
> > > +	} else {
> > > +		cpufreq_update_policy(cpu);
> > > +	}
> > > +	mutex_unlock(&amd_pstate_driver_lock);
> > > +}
> > > +
> > > +static int cppc_boost_hold_time_ns = 3 * NSEC_PER_MSEC;
> > > +
> > > +static inline void amd_pstate_boost_up(struct amd_cpudata *cpudata) {
> > > +	u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> > > +	u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> > > +	u32 max_limit = (hwp_req & 0xff);
> > > +	u32 min_limit = (hwp_req & 0xff00) >> 8;
> > 
> > We can use cpudata->max_perf and cpudata->min_perf directly.
> 
> Iowait boost code removed from V8.
> 
> > 
> > > +	u32 boost_level1;
> > > +
> > > +	/* If max and min are equal or already at max, nothing to boost */
> > 
> > I believe this is the only case that max_perf == min_perf, not at max.
> 
> Iowait boost code removed from V8.
> 
> > 
> > > +	if (max_limit == min_limit)
> > > +		return;
> > > +
> > > +	/* Set boost max and min to initial value */
> > > +	if (!cpudata->cppc_boost_min)
> > > +		cpudata->cppc_boost_min = min_limit;
> > > +
> > > +	boost_level1 = ((AMD_CPPC_NOMINAL_PERF(hwp_cap) +
> > min_limit) >> 1);
> > > +
> > > +	if (cpudata->cppc_boost_min < boost_level1)
> > > +		cpudata->cppc_boost_min = boost_level1;
> > > +	else if (cpudata->cppc_boost_min <
> > AMD_CPPC_NOMINAL_PERF(hwp_cap))
> > > +		cpudata->cppc_boost_min =
> > AMD_CPPC_NOMINAL_PERF(hwp_cap);
> > > +	else if (cpudata->cppc_boost_min ==
> > AMD_CPPC_NOMINAL_PERF(hwp_cap))
> > > +		cpudata->cppc_boost_min = max_limit;
> > > +	else
> > > +		return;
> > 
> > Could you please elaborate the reason that you separate the min_perf
> > (cppc_boost_min) you would like to update into cppc_req register as these
> > different cases? Why we pick up these cases such as (min + nominal)/2, and
> > around nominal? What's the help to optimize the final result? - I am thinking
> > the autonomous mode is handled by SMU firmware, we need to provide
> > some data that let us know it influences the final result.
> > 
> 
> Iowait boost code removed from V8.
> 
> > > +
> > > +	hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> > > +	hwp_req |= AMD_CPPC_MIN_PERF(cpudata->cppc_boost_min);
> > > +	wrmsrl(MSR_AMD_CPPC_REQ, hwp_req);
> > 
> > Do we need an update for share memory processors? In other words, epp is
> > also supported on share memory processors. - again, we should use static
> > call to handle the msr and cppc_acpi stuff.
> > 
> > > +	cpudata->last_update = cpudata->sample.time; }
> > > +
> > > +static inline void amd_pstate_boost_down(struct amd_cpudata *cpudata)
> > > +{
> > > +	bool expired;
> > > +
> > > +	if (cpudata->cppc_boost_min) {
> > > +		expired = time_after64(cpudata->sample.time, cpudata-
> > >last_update +
> > > +					cppc_boost_hold_time_ns);
> > > +
> > > +		if (expired) {
> > > +			wrmsrl(MSR_AMD_CPPC_REQ, cpudata-
> > >cppc_req_cached);
> > > +			cpudata->cppc_boost_min = 0;
> > > +		}
> > > +	}
> > > +
> > > +	cpudata->last_update = cpudata->sample.time; }
> > > +
> > > +static inline void amd_pstate_boost_update_util(struct amd_cpudata
> > *cpudata,
> > > +						      u64 time)
> > > +{
> > > +	cpudata->sample.time = time;
> > > +	if (smp_processor_id() != cpudata->cpu)
> > > +		return;
> > > +
> > > +	if (cpudata->sched_flags & SCHED_CPUFREQ_IOWAIT) {
> > > +		bool do_io = false;
> > > +
> > > +		cpudata->sched_flags = 0;
> > > +		/*
> > > +		 * Set iowait_boost flag and update time. Since IO WAIT flag
> > > +		 * is set all the time, we can't just conclude that there is
> > > +		 * some IO bound activity is scheduled on this CPU with just
> > > +		 * one occurrence. If we receive at least two in two
> > > +		 * consecutive ticks, then we treat as boost candidate.
> > > +		 * This is leveraged from Intel Pstate driver.
> > 
> > I would like to know whether we can hit this case as well? If we can find or
> > create a use case to hit it in our platforms, I am fine to add it our driver as
> > well. If not, I don't suggest it we add them at this moment. I hope we have
> > verified each code path once we add them into the driver.
> 
> Sure, no problem. 
> Iowait boost code removed from V8.
> 
> 
> > 
> > > +		 */
> > > +		if (time_before64(time, cpudata->last_io_update + 2 *
> > TICK_NSEC))
> > > +			do_io = true;
> > > +
> > > +		cpudata->last_io_update = time;
> > > +
> > > +		if (do_io)
> > > +			amd_pstate_boost_up(cpudata);
> > > +
> > > +	} else {
> > > +		amd_pstate_boost_down(cpudata);
> > > +	}
> > > +}
> > > +
> > > +static inline void amd_pstate_cppc_update_hook(struct update_util_data
> > *data,
> > > +						u64 time, unsigned int flags)
> > > +{
> > > +	struct amd_cpudata *cpudata = container_of(data,
> > > +				struct amd_cpudata, update_util);
> > > +
> > > +	cpudata->sched_flags |= flags;
> > > +
> > > +	if (smp_processor_id() == cpudata->cpu)
> > > +		amd_pstate_boost_update_util(cpudata, time); }
> > > +
> > > +static void amd_pstate_clear_update_util_hook(unsigned int cpu) {
> > > +	struct amd_cpudata *cpudata = all_cpu_data[cpu];
> > > +
> > > +	if (!cpudata->update_util_set)
> > > +		return;
> > > +
> > > +	cpufreq_remove_update_util_hook(cpu);
> > > +	cpudata->update_util_set = false;
> > > +	synchronize_rcu();
> > > +}
> > > +
> > > +static void amd_pstate_set_update_util_hook(unsigned int cpu_num) {
> > > +	struct amd_cpudata *cpudata = all_cpu_data[cpu_num];
> > > +
> > > +	if (!cppc_boost) {
> > > +		if (cpudata->update_util_set)
> > > +			amd_pstate_clear_update_util_hook(cpudata->cpu);
> > > +		return;
> > > +	}
> > > +
> > > +	if (cpudata->update_util_set)
> > > +		return;
> > > +
> > > +	cpudata->sample.time = 0;
> > > +	cpufreq_add_update_util_hook(cpu_num, &cpudata->update_util,
> > > +
> > 	amd_pstate_cppc_update_hook);
> > > +	cpudata->update_util_set = true;
> > > +}
> > > +
> > > +static void amd_pstate_epp_init(unsigned int cpu) {
> > > +	struct amd_cpudata *cpudata = all_cpu_data[cpu];
> > > +	u32 max_perf, min_perf;
> > > +	u64 value;
> > > +	s16 epp;
> > > +
> > > +	max_perf = READ_ONCE(cpudata->highest_perf);
> > > +	min_perf = READ_ONCE(cpudata->lowest_perf);
> > > +
> > > +	value = READ_ONCE(cpudata->cppc_req_cached);
> > > +
> > > +	if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> > > +		min_perf = max_perf;
> > > +
> > > +	/* Initial min/max values for CPPC Performance Controls Register */
> > > +	value &= ~AMD_CPPC_MIN_PERF(~0L);
> > > +	value |= AMD_CPPC_MIN_PERF(min_perf);
> > > +
> > > +	value &= ~AMD_CPPC_MAX_PERF(~0L);
> > > +	value |= AMD_CPPC_MAX_PERF(max_perf);
> > > +
> > > +	/* CPPC EPP feature require to set zero to the desire perf bit */
> > > +	value &= ~AMD_CPPC_DES_PERF(~0L);
> > > +	value |= AMD_CPPC_DES_PERF(0);
> > > +
> > > +	if (cpudata->epp_policy == cpudata->policy)
> > > +		goto skip_epp;
> > > +
> > > +	cpudata->epp_policy = cpudata->policy;
> > > +
> > > +	if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> > > +		epp = amd_pstate_get_epp(cpudata, value);
> > > +		cpudata->epp_powersave = epp;
> > 
> > I didn't see where we should have epp_powersave here. Only initial this, but
> > it won't be used anywhere.
> 
> epp_powersave   var was removed from V8.
> 
> 
> > 
> > > +		if (epp < 0)
> > > +			goto skip_epp;
> > > +		/* force the epp value to be zero for performance policy */
> > > +		epp = 0;
> > > +	} else {
> > > +		if (cpudata->epp_powersave < 0)
> > > +			goto skip_epp;
> > > +		/* Get BIOS pre-defined epp value */
> > > +		epp = amd_pstate_get_epp(cpudata, value);
> > > +		if (epp)
> > > +			goto skip_epp;
> > > +		epp = cpudata->epp_powersave;
> > > +	}
> > > +	/* Set initial EPP value */
> > > +	if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > +		value &= ~GENMASK_ULL(31, 24);
> > > +		value |= (u64)epp << 24;
> > > +	}
> > > +
> > > +skip_epp:
> > > +	WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > +	amd_pstate_set_epp(cpudata, epp);
> > > +}
> > > +
> > > +static void amd_pstate_set_max_limits(struct amd_cpudata *cpudata) {
> > > +	u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> > > +	u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> > > +	u32 max_limit = (hwp_cap >> 24) & 0xff;
> > > +
> > > +	hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> > > +	hwp_req |= AMD_CPPC_MIN_PERF(max_limit);
> > > +	wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, hwp_req); }
> > > +
> > > +static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy) {
> > > +	struct amd_cpudata *cpudata;
> > > +
> > > +	if (!policy->cpuinfo.max_freq)
> > > +		return -ENODEV;
> > > +
> > > +	pr_debug("set_policy: cpuinfo.max %u policy->max %u\n",
> > > +				policy->cpuinfo.max_freq, policy->max);
> > > +
> > > +	cpudata = all_cpu_data[policy->cpu];
> > > +	cpudata->policy = policy->policy;
> > > +
> > > +	if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > +		mutex_lock(&amd_pstate_limits_lock);
> > > +
> > > +		if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> > > +			amd_pstate_clear_update_util_hook(policy->cpu);
> > > +			amd_pstate_set_max_limits(cpudata);
> > > +		} else {
> > > +			amd_pstate_set_update_util_hook(policy->cpu);
> > > +		}
> > > +
> > > +		mutex_unlock(&amd_pstate_limits_lock);
> > > +	}
> > > +	amd_pstate_epp_init(policy->cpu);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
> > > +					   struct cpufreq_policy_data *policy)
> > {
> > > +	update_boost_state();
> > > +	cpufreq_verify_within_cpu_limits(policy);
> > > +}
> > > +
> > > +static int amd_pstate_epp_verify_policy(struct cpufreq_policy_data
> > > +*policy) {
> > > +	amd_pstate_verify_cpu_policy(all_cpu_data[policy->cpu], policy);
> > > +	pr_debug("policy_max =%d, policy_min=%d\n", policy->max, policy-
> > >min);
> > > +	return 0;
> > > +}
> > 
> > amd_pstate_verify_cpu_policy and amd_pstate_epp_verify_policy can be
> > squeezed in one function.
> 
> Fixed in V8. 
> 
> > 
> > > +
> > >  static struct cpufreq_driver amd_pstate_driver = {
> > >  	.flags		= CPUFREQ_CONST_LOOPS |
> > CPUFREQ_NEED_UPDATE_LIMITS,
> > >  	.verify		= amd_pstate_verify,
> > > @@ -617,8 +1213,20 @@ static struct cpufreq_driver amd_pstate_driver =
> > {
> > >  	.attr		= amd_pstate_attr,
> > >  };
> > >
> > > +static struct cpufreq_driver amd_pstate_epp_driver = {
> > > +	.flags		= CPUFREQ_CONST_LOOPS,
> > > +	.verify		= amd_pstate_epp_verify_policy,
> > > +	.setpolicy	= amd_pstate_epp_set_policy,
> > > +	.init		= amd_pstate_epp_cpu_init,
> > > +	.exit		= amd_pstate_epp_cpu_exit,
> > > +	.update_limits	= amd_pstate_epp_update_limits,
> > > +	.name		= "amd_pstate_epp",
> > > +	.attr		= amd_pstate_epp_attr,
> > > +};
> > > +
> > >  static int __init amd_pstate_init(void)  {
> > > +	static struct amd_cpudata **cpudata;
> > >  	int ret;
> > >
> > >  	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) @@ -645,7
> > +1253,8 @@
> > > static int __init amd_pstate_init(void)
> > >  	/* capability check */
> > >  	if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > >  		pr_debug("AMD CPPC MSR based functionality is
> > supported\n");
> > > -		amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
> > > +		if (!cppc_active)
> > > +			default_pstate_driver->adjust_perf =
> > amd_pstate_adjust_perf;
> > >  	} else {
> > >  		pr_debug("AMD CPPC shared memory based functionality is
> > supported\n");
> > >  		static_call_update(amd_pstate_enable, cppc_enable); @@ -
> > 653,6
> > > +1262,10 @@ static int __init amd_pstate_init(void)
> > >  		static_call_update(amd_pstate_update_perf,
> > cppc_update_perf);
> > >  	}
> > >
> > > +	cpudata = vzalloc(array_size(sizeof(void *), num_possible_cpus()));
> > > +	if (!cpudata)
> > > +		return -ENOMEM;
> > > +	WRITE_ONCE(all_cpu_data, cpudata);
> > 
> > Why we cannot use cpufreq_policy->driver_data to store the cpudata? I
> > believe the cpudata is per-CPU can be easily get from private data.
> 
> cpufreq_policy->driver_data can do that, but global data can have better cached hit rate especially 
> for the server cluster.
> So I would prefer to use the global cpudata to store each CPU data struct. 
> Could we keep it for EPP driver ?

I didn't get your meaning what's "better cached hit rate especially for the
server cluster"? In my view, there is just different design no performance
enhancement. Since we already implemented in cpufreq_policy->driver_data, I
didn't see any visiblely improvement to change into this way.

Thanks,
Ray

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ