lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Tue,  7 Feb 2023 02:36:36 -0800
From:   Sun Shouxin <sunshouxin@...natelecom.cn>
To:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com
Cc:     linux-kernel@...r.kernel.org, huyd12@...natelecom.cn,
        sunshouxin@...natelecom.cn
Subject: [PATCH] sched: sd_llc_id initialized

In my test,I use isolcpus to isolate cpu for specific,
and then I noticed different scenario when core binding.

For example, the NUMA topology is as follows,
NUMA node0 CPU(s):               0-15,32-47
NUMA node1 CPU(s):               16-31,48-63

and the 'isolcpus' is as follows,
isolcpus=14,15,30,31,46,47,62,63

One task initially running on the non-isolated core belong to NUMA0
was bind to one isolated core on NUMA1, and then change its cpu affinity
to all cores, I notice the task can be scheduled back to the
non-isolated core on NUMA0.

1.taskset -pc 0-13 3512  (task running on core 1)
2.taskset -pc 63 3512    (task running on isolated core 63)
3.taskset -pc 0-63 3512  (task running on core 1)

Another case, one task initially running on the non-isolated core
belong to NUMA1 was bind to one isolated core on NUMA1,
and then change its cpu affinity to  all cores,
the task can not be scheduled out and always run on the isolated core.

1.taskset -pc 16-29 3512 (task running on core 17)
2.taskset -pc 63 3512    (task running on isolated core 63)
3.taskset -pc 0-63 3512  (task still running on core 63
                          and not schedule out)

The root cause is isolcpu not initialized sd_llc_id,
the default value is 0, and it causes cpus_share_cache doesn't work.
  select_task_rq_fair()
        select_idle_sibling()
                cpus_share_cache()

Suggested-by: Hu Yadi <huyd12@...natelecom.cn>
Signed-off-by: Sun Shouxin <sunshouxin@...natelecom.cn>
---
 kernel/sched/topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8739c2a5a54e..89e98d410a8f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -663,7 +663,7 @@ static void destroy_sched_domains(struct sched_domain *sd)
  */
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DEFINE_PER_CPU(int, sd_llc_size);
-DEFINE_PER_CPU(int, sd_llc_id);
+DEFINE_PER_CPU(int, sd_llc_id) = -1;
 DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
-- 
2.27.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ