lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 3 Oct 2023 11:20:46 +0200
From:   Pierre Gondois <pierre.gondois@....com>
To:     Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>,
        dietmar.eggemann@....com
Cc:     linux-kernel@...r.kernel.org, ionela.voinescu@....com,
        qperret@...gle.com, srikar@...ux.vnet.ibm.com,
        mgorman@...hsingularity.net, mingo@...nel.org, yu.c.chen@...el.com,
        tim.c.chen@...ux.intel.com, pauld@...hat.com, lukasz.luba@....com,
        linux-doc@...r.kernel.org, bsegall@...gle.com, mingo@...hat.com,
        peterz@...radead.org, vincent.guittot@...aro.org,
        vschneid@...hat.com
Subject: Re: [PATCH v5 2/2] sched/topology: change behaviour of sysctl
 sched_energy_aware based on the platform

Hello Shrikanth,
Some NITs about the commit message:

On 9/29/23 17:52, Shrikanth Hegde wrote:
> sysctl sched_energy_aware is available for the admin to disable/enable
> energy aware scheduling(EAS). EAS is enabled only if few conditions are
> met by the platform. They are, asymmetric CPU capacity, no SMT,
> schedutil CPUfreq governor, frequency invariant load tracking etc.
> A platform may boot without EAS capability, but could gain such
> capability at runtime For example, changing/registering the CPUfreq

Missing dot I think: 'runtime. For example,'

> governor to schedutil.
> 
> At present, though platform doesn't support EAS, this sysctl returns 1
> and it ends up calling build_perf_domains on write to 1 and
> NOP when writing to 0. That is confusing and un-necessary.

I'm not sure I fully understand the sentence:
- it sounds that the user is writing a value to either 1/0
   (I think the user is writing 1/0 to the sysctl)
- aren't the sched domain rebuilt even when writing 0 to the sysctl ?
   I'm not sure I understand to what the NOP is referring to exactly.

What about:
Platforms without EAS capability currently advertise this sysctl.
Its effects (i.e. rebuilding sched-domains) is unnecessary on
such platforms and its presence can be confusing.

> 
> Desired behavior would be to, have this sysctl to enable/disable the EAS

Unnecessary comma I think

> on supported platform. On Non supported platform write to the sysctl

Non supported  -> non-supported

> would return not supported error and read of the sysctl would return
> empty. So> sched_energy_aware returns empty - EAS is not possible at this moment
> This will include EAS capable platforms which have at least one EAS
> condition false during startup, e.g. using a Performance CPUfreq governor

Just a remark, using the performance governor is not exactly a condition
disabling EAS, it is more 'not using the schedutil CPUfreq governor'

> sched_energy_aware returns 0 - EAS is supported but disabled by admin.
> sched_energy_aware returns 1 - EAS is supported and enabled.
> 
> User can find out the reason why EAS is not possible by checking
> info messages. sched_is_eas_possible returns true if the platform
> can do EAS at this moment.
> 
> Depends on [PATCH v5 1/2] sched/topology: Remove EM_MAX_COMPLEXITY limit
> to be applied first.

I think it's implied as the 2 patches are sent together.

Otherwise:
Tested-by: Pierre Gondois <pierre.gondois@....com>

> 
> Signed-off-by: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
> ---
>   Documentation/admin-guide/sysctl/kernel.rst |   3 +-
>   kernel/sched/topology.c                     | 112 +++++++++++++-------
>   2 files changed, 76 insertions(+), 39 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index cf33de56da27..d89ac2bd8dc4 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -1182,7 +1182,8 @@ automatically on platforms where it can run (that is,
>   platforms with asymmetric CPU topologies and having an Energy
>   Model available). If your platform happens to meet the
>   requirements for EAS but you do not want to use it, change
> -this value to 0.
> +this value to 0. On Non-EAS platforms, write operation fails and
> +read doesn't return anything.
> 
>   task_delayacct
>   ===============
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index e0b9920e7e3e..a654d0186ac0 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -212,6 +212,70 @@ static unsigned int sysctl_sched_energy_aware = 1;
>   static DEFINE_MUTEX(sched_energy_mutex);
>   static bool sched_energy_update;
> 
> +extern struct cpufreq_governor schedutil_gov;
> +static bool sched_is_eas_possible(const struct cpumask *cpu_mask)
> +{
> +	bool any_asym_capacity = false;
> +	struct cpufreq_policy *policy;
> +	struct cpufreq_governor *gov;
> +	int i;
> +
> +	/* EAS is enabled for asymmetric CPU capacity topologies. */
> +	for_each_cpu(i, cpu_mask) {
> +		if (per_cpu(sd_asym_cpucapacity, i)) {
> +			any_asym_capacity = true;
> +			break;
> +		}
> +	}
> +	if (!any_asym_capacity) {
> +		if (sched_debug()) {
> +			pr_info("rd %*pbl: Checking EAS, CPUs do not have asymmetric capacities\n",
> +				cpumask_pr_args(cpu_mask));
> +		}
> +		return false;
> +	}
> +
> +	/* EAS definitely does *not* handle SMT */
> +	if (sched_smt_active()) {
> +		if (sched_debug()) {
> +			pr_info("rd %*pbl: Checking EAS, SMT is not supported\n",
> +				cpumask_pr_args(cpu_mask));
> +		}
> +		return false;
> +	}
> +
> +	if (!arch_scale_freq_invariant()) {
> +		if (sched_debug()) {
> +			pr_info("rd %*pbl: Checking EAS: frequency-invariant load tracking not yet supported",
> +				cpumask_pr_args(cpu_mask));
> +		}
> +		return false;
> +	}
> +
> +	/* Do not attempt EAS if schedutil is not being used. */
> +	for_each_cpu(i, cpu_mask) {
> +		policy = cpufreq_cpu_get(i);
> +		if (!policy) {
> +			if (sched_debug()) {
> +				pr_info("rd %*pbl: Checking EAS, cpufreq policy not set for CPU: %d",
> +					cpumask_pr_args(cpu_mask), i);
> +			}
> +			return false;
> +		}
> +		gov = policy->governor;
> +		cpufreq_cpu_put(policy);
> +		if (gov != &schedutil_gov) {
> +			if (sched_debug()) {
> +				pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
> +					cpumask_pr_args(cpu_mask));
> +			}
> +			return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +
>   void rebuild_sched_domains_energy(void)
>   {
>   	mutex_lock(&sched_energy_mutex);
> @@ -231,6 +295,15 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
>   		return -EPERM;
> 
>   	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +	if (!sched_is_eas_possible(cpu_active_mask)) {
> +		if (write) {
> +			return -EOPNOTSUPP;
> +		} else {
> +			*lenp = 0;
> +			return 0;
> +		}
> +	}
> +
>   	if (!ret && write) {
>   		state = static_branch_unlikely(&sched_energy_present);
>   		if (state != sysctl_sched_energy_aware)
> @@ -351,61 +424,24 @@ static void sched_energy_set(bool has_eas)
>    *    4. schedutil is driving the frequency of all CPUs of the rd;
>    *    5. frequency invariance support is present;
>    */
> -extern struct cpufreq_governor schedutil_gov;
>   static bool build_perf_domains(const struct cpumask *cpu_map)
>   {
>   	int i;
>   	struct perf_domain *pd = NULL, *tmp;
>   	int cpu = cpumask_first(cpu_map);
>   	struct root_domain *rd = cpu_rq(cpu)->rd;
> -	struct cpufreq_policy *policy;
> -	struct cpufreq_governor *gov;
> 
>   	if (!sysctl_sched_energy_aware)
>   		goto free;
> 
> -	/* EAS is enabled for asymmetric CPU capacity topologies. */
> -	if (!per_cpu(sd_asym_cpucapacity, cpu)) {
> -		if (sched_debug()) {
> -			pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
> -					cpumask_pr_args(cpu_map));
> -		}
> -		goto free;
> -	}
> -
> -	/* EAS definitely does *not* handle SMT */
> -	if (sched_smt_active()) {
> -		pr_warn("rd %*pbl: Disabling EAS, SMT is not supported\n",
> -			cpumask_pr_args(cpu_map));
> -		goto free;
> -	}
> -
> -	if (!arch_scale_freq_invariant()) {
> -		if (sched_debug()) {
> -			pr_warn("rd %*pbl: Disabling EAS: frequency-invariant load tracking not yet supported",
> -				cpumask_pr_args(cpu_map));
> -		}
> +	if (!sched_is_eas_possible(cpu_map))
>   		goto free;
> -	}
> 
>   	for_each_cpu(i, cpu_map) {
>   		/* Skip already covered CPUs. */
>   		if (find_pd(pd, i))
>   			continue;
> 
> -		/* Do not attempt EAS if schedutil is not being used. */
> -		policy = cpufreq_cpu_get(i);
> -		if (!policy)
> -			goto free;
> -		gov = policy->governor;
> -		cpufreq_cpu_put(policy);
> -		if (gov != &schedutil_gov) {
> -			if (rd->pd)
> -				pr_warn("rd %*pbl: Disabling EAS, schedutil is mandatory\n",
> -						cpumask_pr_args(cpu_map));
> -			goto free;
> -		}
> -
>   		/* Create the new pd and add it to the local list. */
>   		tmp = pd_init(i);
>   		if (!tmp)
> --
> 2.39.3
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ