lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230918125704.GA240118@lorien.usersys.redhat.com>
Date:   Mon, 18 Sep 2023 08:57:04 -0400
From:   Phil Auld <pauld@...hat.com>
To:     Pierre Gondois <pierre.gondois@....com>
Cc:     Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>,
        dietmar.eggemann@....com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, ionela.voinescu@....com,
        quentin.perret@....com, srikar@...ux.vnet.ibm.com,
        mgorman@...hsingularity.net, mingo@...nel.org, yu.c.chen@...el.com,
        tim.c.chen@...ux.intel.com, mingo@...hat.com, peterz@...radead.org,
        vincent.guittot@...aro.org
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware
 depending on the architecture

On Mon, Sep 18, 2023 at 02:00:11PM +0200 Pierre Gondois wrote:
> Hello Shrikanth,
> 
> On 9/13/23 13:48, Shrikanth Hegde wrote:
> > sysctl_sched_energy_aware is available for the admin to disable/enable
> > energy aware scheduling(EAS). EAS is enabled only if few conditions are
> > met by the platform. They are, asymmetric CPU capacity, no SMT,
> > valid cpufreq policy, frequency invariant load tracking. It is possible
> > platform when booting may not have EAS capability, but can do that after.
> 
> I think:
> "A platform may not boot without EAS capability, but could gain such capability
> at runtime (and vice versa)."
>

s/not//

Otherwise it's a double negative and the sentence doesn't mean anything. Also
it sounds like a failure to boot, which is not the intent I think.


Cheers,
Phil

> > For example, changing/registering the cpufreq policy.
> > 
> > At present, though platform doesn't support EAS, this sysctl is still
> > present and it ends up calling rebuild of sched domain on write to 1 and
> > NOP when writing to 0. That is confusing and un-necessary.
> 
> Maybe:
> "Platforms without EAS capability still have this sysctl. Its effects
> are unnecessary on such platforms (rebuilding sched-domains) and its presence
> can be confusing."
> 
> > 
> > Desired behaviour can be, have the sysctl only when the platform can do
> > EAS. i.e when platform becomes capable enable the sysctl and when it can't
> > remove the sysctl. On Supported platform using this sysctl, admin should be
> > able to enable/disable EAS.
> 
> Maybe just:
> "Dynamically hide/show sysctl_sched_energy_aware by re-evaluating EAS capability
> conditions."
> 
> > 
> > Update the sysctl guide as well.
> > 
> > Different Scenarios:
> > Scenario 1: System while booting has EAS capability.
> > Expected: sysctl will be present and admin can enable/disable EAS by writing
> > 1 or 0 respectively. This operation shouldn't remove the sysctl specially when
> > disabling as sysctl would be needed to enable it later.
> > Scenario 2: System becomes capable of EAS later
> > Expected: At boot, sysctl will not be present. Once eas is enabled by passing
> > all the checks, perf domains will be built and sysctl will be enabled. Any
> > further change to sysctl should be treated same as Scenario 1.
> > Scenario 3: system becomes not capable of EAS.
> > Expected: Since EAS is going to be disabled now, remove the sysctl in this
> > scenario. If it becomes capable of EAS later again, that would be Scenario 2.
> 
> I don't think detailing all the possible scenarios above is necessary here.
> 
> > 
> > v2->v3:
> > Chen Yu and Pierre Gondois both pointed out that if platform becomes
> > capable of EAS later, this patch was not allowing that to happen.
> > Addressed that by using a variable to indicate the sysctl change
> > and re-worded the commit message with desired behaviour,
> > v1->v2:
> > Chen Yu had pointed out that this will not destroy the perf domains on
> > architectures where EAS is supported by changing the sysctl.
> > [v1] Link: https://lore.kernel.org/lkml/20230829065040.920629-1-sshegde@linux.vnet.ibm.com/
> > [v2] Link: https://lore.kernel.org/lkml/20230901065249.137242-1-sshegde@linux.vnet.ibm.com/
> > 
> > Signed-off-by: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
> > ---
> >   Documentation/admin-guide/sysctl/kernel.rst |  3 +-
> >   kernel/sched/topology.c                     | 49 +++++++++++++++++----
> >   2 files changed, 43 insertions(+), 9 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index 3800fab1619b..455e12f1331b 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -1134,7 +1134,8 @@ automatically on platforms where it can run (that is,
> >   platforms with asymmetric CPU topologies and having an Energy
> >   Model available). If your platform happens to meet the
> >   requirements for EAS but you do not want to use it, change
> > -this value to 0.
> > +this value to 0. If platform doesn't support EAS at this moment,
> > +this would be removed.
> 
> Maybe:
> "This file is only advertised if your platform meets EAS requirements."
> 
> (feel free to deny the rewording suggestions)
> 
> > 
> >   task_delayacct
> >   ===============
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 05a5bc678c08..57df938d5ec0 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -208,9 +208,11 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
> > 
> >   #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> >   DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> > -static unsigned int sysctl_sched_energy_aware = 1;
> > +static unsigned int sysctl_sched_energy_aware;
> > +static struct ctl_table_header *sysctl_eas_header;
> >   static DEFINE_MUTEX(sched_energy_mutex);
> >   static bool sched_energy_update;
> > +static bool is_sysctl_eas_changing;
> > 
> >   void rebuild_sched_domains_energy(void)
> >   {
> > @@ -226,6 +228,7 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> >   		void *buffer, size_t *lenp, loff_t *ppos)
> >   {
> >   	int ret, state;
> > +	int prev_val = sysctl_sched_energy_aware;
> > 
> >   	if (write && !capable(CAP_SYS_ADMIN))
> >   		return -EPERM;
> > @@ -233,8 +236,13 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> >   	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> >   	if (!ret && write) {
> >   		state = static_branch_unlikely(&sched_energy_present);
> > -		if (state != sysctl_sched_energy_aware)
> > +		if (state != sysctl_sched_energy_aware && prev_val != sysctl_sched_energy_aware) {
> > +			is_sysctl_eas_changing = true;
> > +			if (sysctl_sched_energy_aware && !state)
> > +				pr_warn("Attempt to build energy domains when EAS is disabled\n");
> >   			rebuild_sched_domains_energy();
> > +			is_sysctl_eas_changing = false;
> > +		}
> >   	}
> > 
> >   	return ret;
> > @@ -255,7 +263,14 @@ static struct ctl_table sched_energy_aware_sysctls[] = {
> > 
> >   static int __init sched_energy_aware_sysctl_init(void)
> >   {
> > -	register_sysctl_init("kernel", sched_energy_aware_sysctls);
> > +	int cpu = cpumask_first(cpu_active_mask);
> > +
> > +	if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
> > +	    !arch_scale_freq_invariant())
> > +		return 0;
> > +
> > +	sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> > +	sysctl_sched_energy_aware = 1;
> >   	return 0;
> >   }
> > 
> > @@ -336,10 +351,29 @@ static void sched_energy_set(bool has_eas)
> >   		if (sched_debug())
> >   			pr_info("%s: stopping EAS\n", __func__);
> >   		static_branch_disable_cpuslocked(&sched_energy_present);
> > +#ifdef CONFIG_PROC_SYSCTL
> > +		/*
> > +		 * if the architecture supports EAS and forcefully
> > +		 * perf domains are destroyed, there should be a sysctl
> > +		 * to enable it later. If this was due to dynamic system
> > +		 * change such as SMT<->NON_SMT then remove sysctl.
> > +		 */
> > +		if (sysctl_eas_header && !is_sysctl_eas_changing) {
> > +			unregister_sysctl_table(sysctl_eas_header);
> > +			sysctl_eas_header = NULL;
> > +			sysctl_sched_energy_aware = 0;
> > +		}
> > +#endif
> >   	} else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
> >   		if (sched_debug())
> >   			pr_info("%s: starting EAS\n", __func__);
> >   		static_branch_enable_cpuslocked(&sched_energy_present);
> > +#ifdef CONFIG_PROC_SYSCTL
> > +		if (!sysctl_eas_header) {
> > +			sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> > +			sysctl_sched_energy_aware = 1;
> > +		}
> > +#endif
> 
> With a kernel which doesn't have a running cpufreq driver, the following scenario fails
> for me:
> # insmod [cpufreq driver].ko
> # cat /proc/sys/kernel/sched_energy_aware
> 1
> # echo 0 > /proc/sys/kernel/sched_energy_aware
> # rmmod cpufreq driver].ko
> # cat /proc/sys/kernel/sched_energy_aware
> 0
> 
> sched_energy_aware should have been removed at that point. In sched_energy_set(),
> sysctl_eas_header sysctl should be unregistered if we go from the state where:
> - EAS is disabled, but possible
> to the state:
> - EAS is disabled, but not possible
> 
> I think the following logic should somehow be extracted in a separate function,
> named sched_energy_aware_possible() for instance (or other as you appreciate).
> The logic should be checked to register/unregister the sysctl.
> ---
> if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
>     !arch_scale_freq_invariant())
> ---
> 
> Also it seemed there was a miss in rebuilding the sched domains when
> a cpufreq driver is removed, but the issue described above is still appearing
> with the following patch applied:
> https://lore.kernel.org/all/20230918112937.493352-1-pierre.gondois@arm.com/
> 
> Regards,
> Pierre
> 
> >   	}
> >   }
> > 
> > @@ -380,15 +414,14 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
> >   	struct cpufreq_policy *policy;
> >   	struct cpufreq_governor *gov;
> > 
> > -	if (!sysctl_sched_energy_aware)
> > +	if (!sysctl_sched_energy_aware && is_sysctl_eas_changing)
> >   		goto free;
> > 
> >   	/* EAS is enabled for asymmetric CPU capacity topologies. */
> >   	if (!per_cpu(sd_asym_cpucapacity, cpu)) {
> > -		if (sched_debug()) {
> > -			pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
> > -					cpumask_pr_args(cpu_map));
> > -		}
> > +		if (sched_debug())
> > +			pr_info("rd %*pbl: Disabling EAS,  CPUs do not have asymmetric capacities\n",
> > +				cpumask_pr_args(cpu_map));
> >   		goto free;
> >   	}
> > 
> > --
> > 2.31.1
> > 
> 

-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ