lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46353945-fced-9b69-b334-3b7ae50c957c@linux.vnet.ibm.com>
Date:   Wed, 27 Sep 2023 22:38:08 +0530
From:   Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     linux-kernel@...r.kernel.org, ionela.voinescu@....com,
        srikar@...ux.vnet.ibm.com, mgorman@...hsingularity.net,
        mingo@...nel.org, pierre.gondois@....com, yu.c.chen@...el.com,
        tim.c.chen@...ux.intel.com, pauld@...hat.com, mingo@...hat.com,
        peterz@...radead.org, vincent.guittot@...aro.org,
        vschneid@...hat.com, qperret@...gle.com
Subject: Re: [PATCH v4] sched/topology: change behaviour of sysctl
 sched_energy_aware based on the platform



On 9/27/23 6:44 PM, Dietmar Eggemann wrote:
> Ah, BTW s/quentin.perret@....com/qperret@...gle.com
> 
> On 27/09/2023 10:14, Shrikanth Hegde wrote:
>>
>>
>> On 9/27/23 2:59 AM, Dietmar Eggemann wrote:
>>> On 26/09/2023 12:00, Shrikanth Hegde wrote:
> 
> [...]
> 
>>>> At present, though platform doesn't support EAS, this sysctl returns 1
>>>> and it ends up calling rebuild of sched domain on write to 1 and
>>>
>>> sched domains are not rebuild in this case, i.e.
>>> partition_sched_domains_locked() does not call detach_destroy_domains()
>>> or build_sched_domains(). Only build_perf_domains() is called which
>>> bails out if !sysctl_sched_energy_aware or one of the EAS conditions is
>>> not true.
>>>
>>
>> ok. that's because it goes to match1 and match2 right?
> 
> yes.
> 
> [...]
> 
>>>> @@ -231,6 +289,14 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
>>>>  		return -EPERM;
>>>>
>>>>  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>>>> +	if (!sched_is_eas_possible(cpu_active_mask)) {
>>>
>>> This is using `cpu_active_mask` so EAS can only be disabled system-wide.
>>>
>>> So I experimented with an exclusive cpuset setup creating a symmetric
>>> (cs1) and an asymmetric (cs2) island on my Juno with its cpumask = [l B
>>> B l l l] (l - little CPU, B - Big CPU).
>>>
>>> root@...o:~# cd /sys/fs/cgroup/cpuset
>>> root@...o:/sys/fs/cgroup/cpuset# mkdir cs1
>>> root@...o:/sys/fs/cgroup/cpuset# echo 1 > cs1/cpuset.cpu_exclusive
>>> root@...o:/sys/fs/cgroup/cpuset# echo 0 > cs1/cpuset.mems
>>> root@...o:/sys/fs/cgroup/cpuset# echo 4,5 > cs1/cpuset.cpus
>>> root@...o:/sys/fs/cgroup/cpuset# mkdir cs2
>>> root@...o:/sys/fs/cgroup/cpuset# echo 1 > cs2/cpuset.cpu_exclusive
>>> root@...o:/sys/fs/cgroup/cpuset# echo 0 > cs2/cpuset.mems
>>> root@...o:/sys/fs/cgroup/cpuset# echo 0-3 > cs2/cpuset.cpus
>>> root@...o:/sys/fs/cgroup/cpuset# echo 0 > cpuset.sched_load_balance
>>>
>>> [ 3021.761278] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{
>>> cpus=0,3-5 nr_pstate=5 }
>>>
>>> root@...o:~# echo 0 > /proc/sys/kernel/sched_energy_aware
>>>
>>> log messages:
>>> ...
>>> [ 3143.538583] rd 4-5: Disabling EAS
>>> [ 3143.549569] rd 0-3: Disabling EAS
>>> [ 3143.560681] sched_energy_set: stopping EAS
>>> ...
>>>
>>> root@...o:~# echo 1 > /proc/sys/kernel/sched_energy_aware
>>>
>>> log messages:
>>> ...
>>> [ 3223.277521] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{
>>> cpus=0,3-5 nr_pstate=5 }
>>> [ 3223.293409] sched_energy_set: starting EAS
>>>
>>> Seems still to work correctly.
>>
>> I see that can be a issue. using first cpu here check to asymmetric cpu capacity. 
>> It would have worked here, since the first group had asymmetry. ( l B B l ). 
>> It wouldn't have worked if the first group had like ( l l ) and second group has ( l B B l )
> 
> Yeah, that's true.
> 
>   sched_is_eas_possible(const struct cpumask *cpu_mask)
> 
>     !per_cpu(sd_asym_cpucapacity, cpumask_first(cpu_mask));
> 
> cpusets cs1=[0,5] and cs2=[1-4] as such an example.
> 

right. 

> 
>> Instead of cpu_active_mask, I can make use of ndoms_cur and doms_cur[i] logic to 
>> traverse through possible doms, and if none of the domains have asymmetric cpu capacity
>> return false.  Does that look right?
> 
>   rebuild_sched_domains()
> 
>     rebuild_sched_domains_locked()
> 
>       ndoms = generate_sched_domains(&doms, &attr)
> 
> You would need generate_sched_domains() in sched_energy_aware_handler()?
> 

clearly I didnt think through this. ndoms_cur and doms_cur are updated at the end. 
So If EAS is possible at boot, this would fail. 


What  sched_is_eas_possible needs is if there is asymmetric cpu topology. 
Simpler loop of all CPU's and checking if there any of them have sd_asym_cpucapacity might do,
though it goes through all CPU's, Since these functions are not in hot path
So it should not affect any performance. Something like below would work?

	bool any_asym_capacity = false;

        /* EAS is enabled for asymmetric CPU capacity topologies. */
        for_each_cpu(i, cpu_mask) {
                if (per_cpu(sd_asym_cpucapacity, i)) {
                        any_asym_capacity = true;
                        break;
                }
        }
        if (!any_asym_capacity) {
                if (sched_debug()) {
                        pr_info("rd %*pbl: Checking EAS, CPUs do not have asymmetric capacities\n",
                                cpumask_pr_args(cpu_mask));
                }
                return false;
        }



>>> [...]
>>>
>>>> @@ -458,6 +487,8 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
>>>>  	return !!pd;
>>>>
>>>>  free:
>>>> +	if (sched_debug())
>>>> +		pr_warn("rd %*pbl: Disabling EAS", cpumask_pr_args(cpu_map));
>>>
>>> Shouldn't this be used in condition `if (!sysctl_sched_energy_aware)`?
>>> Otherwise we have 2 warnings when the other conditions which leads to
>>> `goto free` aren't met.
>> Since sched_energy_set has the info messages about start and stop of EAS, maybe 
>> this debug is not needed. Will remove it. Doing it only `if (!sysctl_sched_energy_aware)`
> 
> OK.
> 
>> also doesn't seem correct, as calling free in this function would free the perf_domains.  
> 
> But !sched_is_eas_possible() also does `goto free` and in there we we
> emit pr_info's indicating why EAS isn't possible right now.
> 
> When issuing a:
> 
> # echo 0 > /proc/sys/kernel/sched_energy_aware
> 
> we would see in the logs:
> 
> ...
> [  416.325324] rd 0-5: sysctl_sched_energy_aware is N   <-- (*)
> [  416.337844] sched_energy_set: stopping EAS
> ...
> 
> but maybe (*) is not necessary ...

ok. I think we can add similar info message in if(!sysctl_sched_energy_aware) like below?
Would that be enough? 

        if (!sysctl_sched_energy_aware) {
                if (sched_debug()) {
                        pr_info("rd %*pbl: Checking EAS, sysclt sched_energy_aware set to N\n",
                                cpumask_pr_args(cpu_map));
                }
                goto free;
        }

and remove the below one.
        if (sched_debug())
                pr_warn("rd %*pbl: Disabling EAS", cpumask_pr_args(cpu_map));


So there will be these "Checking EAS" and if possible, "starting EAS" or "stopping EAS" message. 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ