linux-kernel - Re: [PATCH 3/3] sched/fair: Ensure select housekeeping cpus in task_numa_find

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b8f5837a-2112-4bca-b99c-98ca41d3ec66@amd.com>
Date: Fri, 27 Dec 2024 10:10:49 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Chuyi Zhou <zhouchuyi@...edance.com>, <mingo@...hat.com>,
	<peterz@...radead.org>, <juri.lelli@...hat.com>,
	<vincent.guittot@...aro.org>, <dietmar.eggemann@....com>,
	<rostedt@...dmis.org>, <bsegall@...gle.com>, <mgorman@...e.de>,
	<vschneid@...hat.com>
CC: <chengming.zhou@...ux.dev>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/3] sched/fair: Ensure select housekeeping cpus in
 task_numa_find_cpu

Hello Chuyi,

On 12/23/2024 6:28 PM, Chuyi Zhou wrote:
> 
> 
> 在 2024/12/18 14:21, K Prateek Nayak 写道:
>> Hello Chuyi,
>>
>> On 12/16/2024 5:53 PM, Chuyi Zhou wrote:
>>> [..snip..]
>>> @@ -2081,6 +2081,12 @@ numa_type numa_classify(unsigned int imbalance_pct,
>>>       return node_fully_busy;
>>>   }
>>> +static inline bool numa_migrate_test_cpu(struct task_struct *p, int cpu)
>>> +{
>>> +    return cpumask_test_cpu(cpu, p->cpus_ptr) &&
>>> +            housekeeping_cpu(cpu, HK_TYPE_DOMAIN);
>>> +}
>>> +
>>>   #ifdef CONFIG_SCHED_SMT
>>>   /* Forward declarations of select_idle_sibling helpers */
>>>   static inline bool test_idle_cores(int cpu);
>>> @@ -2168,7 +2174,7 @@ static void task_numa_assign(struct task_numa_env *env,
>>>           /* Find alternative idle CPU. */
>>>           for_each_cpu_wrap(cpu, cpumask_of_node(env->dst_nid), start + 1) {
>>
>> Can we just do:
>>
>>      for_each_cpu_and(cpu, cpumask_of_node(env->dst_nid), housekeeping_cpumask(HK_TYPE_DOMAIN)) {
>>          ...
>>      }
>>
>> and avoid adding numa_migrate_test_cpu(). Thoughts?
> 
> Make sense, but now there doesn't seem to be an API like for_each_cpu_wrap_and().
> 
> Do you think the following is better?
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 855df103f4dd..4792ef672738 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2167,9 +2167,9 @@ static void task_numa_assign(struct task_numa_env *env,
>                  int start = env->dst_cpu;
> 
>                  /* Find alternative idle CPU. */
> -               for_each_cpu_wrap(cpu, cpumask_of_node(env->dst_nid), start + 1) {
> +               for_each_cpu_and(cpu, cpumask_of_node(env->dst_nid), housekeeping_cpumask(HK_TYPE_DOMAIN)) {
>                          if (cpu == env->best_cpu || !idle_cpu(cpu) ||

"start" is set to "env->dst_cpu" is already taken care here with the
first comparison.

> -                           !cpumask_test_cpu(cpu, env->p->cpus_ptr)) {
> +                               cpu == start || !cpumask_test_cpu(cpu, env->p->cpus_ptr)) {
>                                  continue;
>                          }
> 

I think the for_each_cpu_wrap() was used to reduce contention for xchg
operation below. Perhaps we can have a per-cpu temporary mask (like
load_balance_mask) if we want to reduce the xchg contention and break
this into cpumask_and() + for_each_cpu_wrap() steps. I'm not sure if
any of the existing masks (load_balance_mask, select_rq_mask,
should_we_balance_tmpmask) can be safely reused. Otherwise, perhaps we
can make a case for for_each_cpu_and_wrap() with this use case.

> 
> Thanks.
> 
> 
>>
>>>               if (cpu == env->best_cpu || !idle_cpu(cpu) ||
>>> -                !cpumask_test_cpu(cpu, env->p->cpus_ptr)) {
>>> +                !numa_migrate_test_cpu(env->p, cpu)) {
>>>                   continue;
>>>               }
>>> @@ -2480,7 +2486,7 @@ static void task_numa_find_cpu(struct task_numa_env *env,
>>>       for_each_cpu(cpu, cpumask_of_node(env->dst_nid)) {
>>
>> Same modifications can be made for this outer loop.
>>
> 

-- 
Thanks and Regards,
Prateek