linux-kernel - Re: [PATCH v4] sched/topology: change behaviour of sysctl sched_energy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8ac1576c-909b-ec6b-930d-0683ca288bf9@arm.com>
Date:   Wed, 27 Sep 2023 15:14:21 +0200
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>, mingo@...hat.com,
        peterz@...radead.org, vincent.guittot@...aro.org,
        vschneid@...hat.com
Cc:     linux-kernel@...r.kernel.org, ionela.voinescu@....com,
        quentin.perret@....com, srikar@...ux.vnet.ibm.com,
        mgorman@...hsingularity.net, mingo@...nel.org,
        pierre.gondois@....com, yu.c.chen@...el.com,
        tim.c.chen@...ux.intel.com, pauld@...hat.com
Subject: Re: [PATCH v4] sched/topology: change behaviour of sysctl
 sched_energy_aware based on the platform

Ah, BTW s/quentin.perret@....com/qperret@...gle.com

On 27/09/2023 10:14, Shrikanth Hegde wrote:
> 
> 
> On 9/27/23 2:59 AM, Dietmar Eggemann wrote:
>> On 26/09/2023 12:00, Shrikanth Hegde wrote:

[...]

>>> At present, though platform doesn't support EAS, this sysctl returns 1
>>> and it ends up calling rebuild of sched domain on write to 1 and
>>
>> sched domains are not rebuild in this case, i.e.
>> partition_sched_domains_locked() does not call detach_destroy_domains()
>> or build_sched_domains(). Only build_perf_domains() is called which
>> bails out if !sysctl_sched_energy_aware or one of the EAS conditions is
>> not true.
>>
> 
> ok. that's because it goes to match1 and match2 right?

yes.

[...]

>>> @@ -231,6 +289,14 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
>>>  		return -EPERM;
>>>
>>>  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>>> +	if (!sched_is_eas_possible(cpu_active_mask)) {
>>
>> This is using `cpu_active_mask` so EAS can only be disabled system-wide.
>>
>> So I experimented with an exclusive cpuset setup creating a symmetric
>> (cs1) and an asymmetric (cs2) island on my Juno with its cpumask = [l B
>> B l l l] (l - little CPU, B - Big CPU).
>>
>> root@...o:~# cd /sys/fs/cgroup/cpuset
>> root@...o:/sys/fs/cgroup/cpuset# mkdir cs1
>> root@...o:/sys/fs/cgroup/cpuset# echo 1 > cs1/cpuset.cpu_exclusive
>> root@...o:/sys/fs/cgroup/cpuset# echo 0 > cs1/cpuset.mems
>> root@...o:/sys/fs/cgroup/cpuset# echo 4,5 > cs1/cpuset.cpus
>> root@...o:/sys/fs/cgroup/cpuset# mkdir cs2
>> root@...o:/sys/fs/cgroup/cpuset# echo 1 > cs2/cpuset.cpu_exclusive
>> root@...o:/sys/fs/cgroup/cpuset# echo 0 > cs2/cpuset.mems
>> root@...o:/sys/fs/cgroup/cpuset# echo 0-3 > cs2/cpuset.cpus
>> root@...o:/sys/fs/cgroup/cpuset# echo 0 > cpuset.sched_load_balance
>>
>> [ 3021.761278] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{
>> cpus=0,3-5 nr_pstate=5 }
>>
>> root@...o:~# echo 0 > /proc/sys/kernel/sched_energy_aware
>>
>> log messages:
>> ...
>> [ 3143.538583] rd 4-5: Disabling EAS
>> [ 3143.549569] rd 0-3: Disabling EAS
>> [ 3143.560681] sched_energy_set: stopping EAS
>> ...
>>
>> root@...o:~# echo 1 > /proc/sys/kernel/sched_energy_aware
>>
>> log messages:
>> ...
>> [ 3223.277521] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{
>> cpus=0,3-5 nr_pstate=5 }
>> [ 3223.293409] sched_energy_set: starting EAS
>>
>> Seems still to work correctly.
> 
> I see that can be a issue. using first cpu here check to asymmetric cpu capacity. 
> It would have worked here, since the first group had asymmetry. ( l B B l ). 
> It wouldn't have worked if the first group had like ( l l ) and second group has ( l B B l )

Yeah, that's true.

  sched_is_eas_possible(const struct cpumask *cpu_mask)

    !per_cpu(sd_asym_cpucapacity, cpumask_first(cpu_mask));

cpusets cs1=[0,5] and cs2=[1-4] as such an example.


> Instead of cpu_active_mask, I can make use of ndoms_cur and doms_cur[i] logic to 
> traverse through possible doms, and if none of the domains have asymmetric cpu capacity
> return false.  Does that look right?

  rebuild_sched_domains()

    rebuild_sched_domains_locked()

      ndoms = generate_sched_domains(&doms, &attr)

You would need generate_sched_domains() in sched_energy_aware_handler()?

>> [...]
>>
>>> @@ -458,6 +487,8 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
>>>  	return !!pd;
>>>
>>>  free:
>>> +	if (sched_debug())
>>> +		pr_warn("rd %*pbl: Disabling EAS", cpumask_pr_args(cpu_map));
>>
>> Shouldn't this be used in condition `if (!sysctl_sched_energy_aware)`?
>> Otherwise we have 2 warnings when the other conditions which leads to
>> `goto free` aren't met.
> Since sched_energy_set has the info messages about start and stop of EAS, maybe 
> this debug is not needed. Will remove it. Doing it only `if (!sysctl_sched_energy_aware)`

OK.

> also doesn't seem correct, as calling free in this function would free the perf_domains.  

But !sched_is_eas_possible() also does `goto free` and in there we we
emit pr_info's indicating why EAS isn't possible right now.

When issuing a:

# echo 0 > /proc/sys/kernel/sched_energy_aware

we would see in the logs:

...
[  416.325324] rd 0-5: sysctl_sched_energy_aware is N   <-- (*)
[  416.337844] sched_energy_set: stopping EAS
...

but maybe (*) is not necessary ...