lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b23259e62826ee8be14c6fe5dcb4bfad40d4bee.camel@linux.intel.com>
Date:   Tue, 05 Sep 2023 14:53:20 -0700
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     Pierre Gondois <pierre.gondois@....com>,
        Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
Cc:     dietmar.eggemann@....com, vincent.guittot@...aro.org,
        peterz@...radead.org, mingo@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, ionela.voinescu@....com,
        quentin.perret@....com, srikar@...ux.vnet.ibm.com,
        mgorman@...hsingularity.net, mingo@...nel.org, yu.c.chen@...el.com
Subject: Re: [PATCH v2] sched/topology: remove sysctl_sched_energy_aware
 depending on the architecture

On Tue, 2023-09-05 at 16:03 +0200, Pierre Gondois wrote:
> Hello Shrikanth,
> I tried the patch (on a platform using the cppc_cpufreq driver). The platform
> normally has EAS enabled, but the patch removed the sched_energy_aware sysctl.
> It seemed the following happened (in the below order):
> 
> 1. sched_energy_aware_sysctl_init()
> Doesn't set sysctl_sched_energy_aware as cpufreq_freq_invariance isn't set
> and arch_scale_freq_invariant() returns false
> 
> 2. cpufreq_register_driver()
> Sets cpufreq_freq_invariance during cpufreq initialization sched_energy_set()
> 
> 3. sched_energy_set()
> Is called with has_eas=0 since build_perf_domains() doesn't see the platform
> as EAS compatible. Indeed sysctl_sched_energy_aware=0.
> So with sysctl_sched_energy_aware=0 and has_eas=0, sched_energy_aware sysctl
> is not enabled even though EAS should be possible.
> 
> 
> On 9/1/23 08:52, Shrikanth Hegde wrote:
> > Currently sysctl_sched_energy_aware doesn't alter the said behaviour on
> > some of the architectures. IIUC its meant to either force rebuild the
> > perf domains or cleanup the perf domains by echoing 1 or 0 respectively.
> 
> There is a definition of the sysctl at:
> Documentation/admin-guide/sysctl/kernel.rst::sched_energy_aware
> 
> Also a personal comment about the commit message (FWIW), I think it should
> be a bit more impersonal and factual. The commit message seems to describe
> the code rather than the desired behaviour.

I also wonder if Shrikanth's description of the operations can be simplified.

In my mind, There are 3 variables describing the system:

1. sched_energy_capable : whether system is EAS capable
2. sched_energy_aware   : whether the admin wants to enables EAS
3. sched_energy_status  : sched_energy_capable && sched_energy_aware

Whenever there is a change in sched_energy_status, then we should trigger a rebuild
of the sched domain.  We should expose sched_energy_capable
to user rather than removing sched_energy_aware when sched_energy_capable == 0.

If the user know the value of sched_energy_capable, the user will know
if setting sched_energy_aware will change the system's sched_energy_status.

For system that can never support EAS,
we should simply make sched_energy_aware to be 0 and disallow it from getting written.

On systems that allow sched_energy_capable to be enabled (e.g. by brining smt on/offline),
we should allow setting sched_energy_aware even when sched_energy_capable is 0.
Once sched_energy_capable becomes 1, EAS is enabled.


Tim 
  
> 
> > 
> > perf domains are not built when there is SMT, or when there is no
> > Asymmetric CPU topologies or when there is no frequency invariance.
> > Since such cases EAS is not set and perf domains are not built. By
> > changing the values of sysctl_sched_energy_aware, its not possible to
> > force build the perf domains. Hence remove this sysctl on such platforms
> > that dont support it. Some of the settings can be changed later
> > such as smt_active by offlining the CPU's, In those cases if
> > build_perf_domains returns true, re-enable the sysctl.
> > 
> > Anytime, when sysctl_sched_energy_aware is changed sched_energy_update
> > is set when building the perf domains. Making use of that to find out if
> > the change is happening by sysctl or dynamic system change.
> > 
> > Taking different cases:
> > Case1. system while booting has EAS capability, sysctl will be set 1. Hence
> > perf domains will be built if needed. On changing the sysctl to 0, since
> > sched_energy_update is true, perf domains would be freed and sysctl will
> > not be removed. later sysctl is changed to 1, enabling the perf domains
> > rebuild again. Since sysctl is already there, it will skip register.
> > 
> > Case2. System while booting doesn't have EAS Capability. Later after system
> > change it becomes capable of EAS. sched_energy_update is false. Though
> > sysctl is 0, will go ahead and try to enable eas. This is the current
> > behaviour. if has_eas  is true, then sysctl will be registered. After
> > that any sysctl change is same as Case1.
> > 
> > Case3. System becomes not capable of EAS due to system change. Here since
> > sched_energy_update is false, build_perf_domains return has_eas as false
> > due to one of the checks and Since this is dynamic change remove the sysctl.
> > Any further change which enables EAS is Case2
> > 
> > Note: This hasn't been tested on platform which supports EAS. If the
> > change can be verified on that it would really help. This has been
> > tested on power10 which doesn't support EAS. sysctl_sched_energy_aware
> > is removed with patch.
> > 
> > changes since v1:
> > Chen Yu had pointed out that this will not destroy the perf domains on
> > architectures where EAS is supported by changing the sysctl. This patch
> > addresses that.
> > [v1] Link: https://lore.kernel.org/lkml/20230829065040.920629-1-sshegde@linux.vnet.ibm.com/#t
> > 
> > Signed-off-by: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
> > ---
> >   kernel/sched/topology.c | 45 +++++++++++++++++++++++++++++++++--------
> >   1 file changed, 37 insertions(+), 8 deletions(-)
> > 
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 05a5bc678c08..4d16269ac21a 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -208,7 +208,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
> > 
> >   #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> >   DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> > -static unsigned int sysctl_sched_energy_aware = 1;
> > +static unsigned int sysctl_sched_energy_aware;
> > +static struct ctl_table_header *sysctl_eas_header;
> 
> The variables around the presence/absence of EAS are:
> - sched_energy_present:
> EAS is up and running
> 
> - sysctl_sched_energy_aware:
> The user wants to use EAS (or not). Doesn't mean EAS can run on the
> platform.
> 
> - sched_energy_set/partition_sched_domains_locked's "has_eas":
> Local variable. Represent whether EAS can run on the platform.
> 
> IMO it would be simpler to (un)register sched_energy_aware sysctl
> in partition_sched_domains_locked(), based on the value of "has_eas".
> This would allow to let all the logic as it is right now, inside
> build_perf_domains(), and then advertise sched_energy_aware sysctl
> if EAS can run on the platform.
> sched_energy_aware_sysctl_init() would be deleted then.
> 
> 
> >   static DEFINE_MUTEX(sched_energy_mutex);
> >   static bool sched_energy_update;
> > 
> > @@ -226,6 +227,7 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> >   		void *buffer, size_t *lenp, loff_t *ppos)
> >   {
> >   	int ret, state;
> > +	int prev_val = sysctl_sched_energy_aware;
> > 
> >   	if (write && !capable(CAP_SYS_ADMIN))
> >   		return -EPERM;
> > @@ -233,8 +235,11 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> >   	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> >   	if (!ret && write) {
> >   		state = static_branch_unlikely(&sched_energy_present);
> > -		if (state != sysctl_sched_energy_aware)
> > +		if (state != sysctl_sched_energy_aware && prev_val != sysctl_sched_energy_aware) {
> > +			if (sysctl_sched_energy_aware && !state)
> > +				pr_warn("Attempt to build energy domains when EAS is disabled\n");
> >   			rebuild_sched_domains_energy();
> > +		}
> >   	}
> > 
> >   	return ret;
> > @@ -255,7 +260,14 @@ static struct ctl_table sched_energy_aware_sysctls[] = {
> > 
> >   static int __init sched_energy_aware_sysctl_init(void)
> >   {
> > -	register_sysctl_init("kernel", sched_energy_aware_sysctls);
> > +	int cpu = cpumask_first(cpu_active_mask);
> > +
> > +	if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
> > +	    !arch_scale_freq_invariant())
> > +		return 0;
> > +
> > +	sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> > +	sysctl_sched_energy_aware = 1;
> >   	return 0;
> >   }
> > 
> > @@ -336,10 +348,28 @@ static void sched_energy_set(bool has_eas)
> >   		if (sched_debug())
> >   			pr_info("%s: stopping EAS\n", __func__);
> >   		static_branch_disable_cpuslocked(&sched_energy_present);
> > +#ifdef CONFIG_PROC_SYSCTL
> > +		/*
> > +		 * if the architecture supports EAS and forcefully
> > +		 * perf domains are destroyed, there should be a sysctl
> > +		 * to enable it later. If this was due to dynamic system
> > +		 * change such as SMT<->NON_SMT then remove sysctl.
> > +		 */
> > +		if (sysctl_eas_header && !sched_energy_update) {
> > +			unregister_sysctl_table(sysctl_eas_header);
> > +			sysctl_eas_header = NULL;
> > +		}
> > +#endif
> > +		sysctl_sched_energy_aware = 0;
> >   	} else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
> >   		if (sched_debug())
> >   			pr_info("%s: starting EAS\n", __func__);
> >   		static_branch_enable_cpuslocked(&sched_energy_present);
> > +#ifdef CONFIG_PROC_SYSCTL
> > +		if (!sysctl_eas_header)
> > +			sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> > +#endif
> > +		sysctl_sched_energy_aware = 1;
> >   	}
> >   }
> > 
> > @@ -380,15 +410,14 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
> >   	struct cpufreq_policy *policy;
> >   	struct cpufreq_governor *gov;
> > 
> > -	if (!sysctl_sched_energy_aware)
> > +	if (!sysctl_sched_energy_aware && sched_energy_update)
> >   		goto free;
> > 
> >   	/* EAS is enabled for asymmetric CPU capacity topologies. */
> >   	if (!per_cpu(sd_asym_cpucapacity, cpu)) {
> > -		if (sched_debug()) {
> > -			pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
> > -					cpumask_pr_args(cpu_map));
> > -		}
> > +		if (sched_debug())
> > +			pr_info("rd %*pbl: Disabling EAS,  CPUs do not have asymmetric capacities\n",
> > +				cpumask_pr_args(cpu_map));
> >   		goto free;
> >   	}
> > 
> > --
> > 2.31.1
> > 
> > 
> 
> Regards,
> Pierre

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ