[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63713cac-605a-c647-baf7-b172c02f8336@arm.com>
Date: Fri, 22 Sep 2023 16:44:47 +0200
From: Pierre Gondois <pierre.gondois@....com>
To: Valentin Schneider <vschneid@...hat.com>,
Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
Cc: dietmar.eggemann@....com, linux-kernel@...r.kernel.org,
ionela.voinescu@....com, quentin.perret@....com,
srikar@...ux.vnet.ibm.com, mgorman@...hsingularity.net,
mingo@...nel.org, yu.c.chen@...el.com, tim.c.chen@...ux.intel.com,
mingo@...hat.com, peterz@...radead.org, vincent.guittot@...aro.org
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware
depending on the architecture
On 9/18/23 14:22, Valentin Schneider wrote:
> On 15/09/23 23:40, Shrikanth Hegde wrote:
>> On 9/15/23 5:30 PM, Valentin Schneider wrote:
>>> On 14/09/23 23:26, Shrikanth Hegde wrote:
>>>> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>>>>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>>>>> sysctl_sched_energy_aware is available for the admin to disable/enable
>>>>>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>>>>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>>>>> valid cpufreq policy, frequency invariant load tracking. It is possible
>>>>>> platform when booting may not have EAS capability, but can do that after.
>>>>>> For example, changing/registering the cpufreq policy.
>>>>>>
>>>>>> At present, though platform doesn't support EAS, this sysctl is still
>>>>>> present and it ends up calling rebuild of sched domain on write to 1 and
>>>>>> NOP when writing to 0. That is confusing and un-necessary.
>>>>>>
>>>>>
>>>>
>>>> Hi Valentin, Thanks for taking a look at this patch.
>>>>
>>>>> But why would you write to it in the first place? Or do you mean to use
>>>>> this as an indicator for userspace that EAS is supported?
>>>>>
>>>>
>>>> Since this sysctl is present and its value being 1, it gives the
>>>> impression to the user that EAS is supported when it is not.
>>>> So its an attempt to correct that part.
>>>>
>>>
>>> Ah, I see. Then how about just making the sysctl return 0 when EAS isn't
>>> supported? And on top of it, prevent all writes when EAS isn't supported
>>> (perf domains cannot be built, so there would be no point in forcing a
>>> rebuild that will do nothing).
>>
>> Yes. That's another way. Thats what I had as possible approach in
>> https://lore.kernel.org/lkml/d2c945d6-c4f0-a096-0623-731b11484f51@linux.vnet.ibm.com/
>>
>
> Thanks for the link; and apologies for bringing up topics that have been
> discussed already.
>
>>
>>
>>>
>>> I can never remember how to properly use the sysctl API, so that's a very
>>> crude implementation, but something like so?
>>>
>>> ---
>>>
>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>> index 05a5bc678c089..dadfc5afc4121 100644
>>> --- a/kernel/sched/topology.c
>>> +++ b/kernel/sched/topology.c
>>> @@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
>>> if (write && !capable(CAP_SYS_ADMIN))
>>> return -EPERM;
>>>
>>> + if (!sched_energy_enabled()) {
>>
>> Use of sched_energy_enabled won't work as Pierre has indicated.
>>
>> Instead this can be done by adding those checks in a helper function to
>> do similar checks as done build_perf_domains.
>>
>> I can send v4 with this approach if it makes more sense. Please let me know.
>>
>
> So what I'm thinking is the standard approach seems to be to keep the knobs
> visible, but change how reads/writes to them are handled.
>
> For instance, SMT support has
>
> /sys/devices/system/cpu/smt
> /control
> /active
>
> And a system with CONFIG_HOTPLUG_SMT=y but no actual hardware SMT will
> have:
>
> /control = notsupported
> /active = 0
Having such interface for EAS would be ideal no ?
/active:
would be the equivalent of the current sysctl_sched_energy_aware
/control:
would show whether CONFIG_SCHED_DEBUG was set and all the conditions
to have EAS enabled are satisfied.
Possible states for SMT:
---
static const char *smt_states[] = {
[CPU_SMT_ENABLED] = "on", // EAS possible and running
[CPU_SMT_DISABLED] = "off", // EAS possible and not running
[CPU_SMT_FORCE_DISABLED] = "forceoff", // not applicable for EAS
[CPU_SMT_NOT_SUPPORTED] = "notsupported", // system with smt or not asymmetric or no freq invariance
[CPU_SMT_NOT_IMPLEMENTED] = "notimplemented", // CONFIG_SCHED_DEBUG=n
};
---
>
> So IMO it would make sense to keep sched_energy_aware around, but make it
> read 0 and prevent writes for systems that have the software support
> compiled but don't have the actual hardware support.
>
> In a pinch it also helps to know if CONFIG_ENERGY_MODEL was selected,
> though that's obvious enough with CONFIG_SCHED_DEBUG=y.
>
Powered by blists - more mailing lists