linux-kernel - Re: [RFC PATCH] sched/fair: Skip idle CPU search on busy system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8ca8f35c-a3af-c663-e254-fd325a7d39ca@linux.vnet.ibm.com>
Date:   Thu, 27 Jul 2023 20:34:53 +0530
From:   Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
To:     Chen Yu <yu.c.chen@...el.com>
Cc:     peterz@...radead.org, vincent.guittot@...aro.org,
        srikar@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
        mingo@...hat.com, dietmar.eggemann@....com, mgorman@...e.de,
        rui.zhang@...el.com, tim.c.chen@...el.com
Subject: Re: [RFC PATCH] sched/fair: Skip idle CPU search on busy system



On 7/27/23 12:55 PM, Chen Yu wrote:
> On 2023-07-26 at 15:06:12 +0530, Shrikanth Hegde wrote:
>> When the system is fully busy, there will not be any idle CPU's.
>> In that case, load_balance will be called mainly with CPU_NOT_IDLE
>> type. In should_we_balance its currently checking for an idle CPU if
>> one exist. When system is 100% busy, there will not be an idle CPU and
>> these idle_cpu checks can be skipped. This would avoid fetching those rq
>> structures.
>>
> 
> Yes, I guess this could help reducing the cost if the sched group
> has many CPUs. 

Thank you for the review Chen Yu. 

> 
>> This is a minor optimization for a specific case of 100% utilization.
>>
>> .....
>> Coming to the current implementation. It is a very basic approach to the
>> issue. It may not be the best/perfect way to this.  It works only in
>> case of CONFIG_NO_HZ_COMMON. nohz.nr_cpus is a global info available which
>> tracks idle CPU's. AFAIU there isn't any other. If there is such info, we
>> can use that instead. nohz.nr_cpus is atomic, which might be costly too.
>>
>> Alternative way would be to add a new attribute to sched_domain and update
>> it in cpu idle entry/exit path per CPU. Advantage is, check can be per
>> env->sd instead of global. Slightly complicated, but maybe better. there
>> could other advantage at wake up to limit the scan etc.
>>
> 
> When checking the code, I found that there is per domain nr_busy_cpus.
> However that variable is only for LLC domain. Maybe extend the sd_share
> for domains under NUMA is applicable IMO.

True. I did see that. Doing at every level when there are large number 
of CPU's will likely need lock when updating the sd_share and that would 
be the bottleneck as well. Since sd_share never makes sense for NUMA, 
This would cause different code check for NUMA and non-NUMA. Though main benefit 
for this corner case would be in NUMA as there would be large number of CPU's there.

I will keep that thought and will try to work something along.

> 
> thanks,
> Chenyu
> 
>> Your feedback would really help. Does this optimization makes sense?
>>
>> Signed-off-by: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
>> ---
>>  kernel/sched/fair.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 373ff5f55884..903d59b5290c 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -10713,6 +10713,12 @@ static int should_we_balance(struct lb_env *env)
>>  		return 1;
>>  	}
>>
>> +#ifdef CONFIG_NO_HZ_COMMON
>> +	/* If the system is fully busy, its better to skip the idle checks */
>> +	if (env->idle == CPU_NOT_IDLE && atomic_read(&nohz.nr_cpus) == 0)
>> +		return group_balance_cpu(sg) == env->dst_cpu;
>> +#endif
>> +
>>  	/* Try to find first idle CPU */
>>  	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
>>  		if (!idle_cpu(cpu))
>> --
>> 2.31.1
>>