linux-kernel - Re: [RFC PATCH v4 08/28] sched: Set up LLC indexing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9c332c5b-83d9-465c-b02f-6648af9a9fae@os.amperecomputing.com>
Date: Mon, 29 Sep 2025 18:43:27 +0800
From: Adam Li <adamli@...amperecomputing.com>
To: "Chen, Yu C" <yu.c.chen@...el.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Libo Chen <libo.chen@...cle.com>,
 Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
 Hillf Danton <hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>,
 Jianyong Wu <jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>,
 Tingyin Duan <tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>,
 Len Brown <len.brown@...el.com>, Tim Chen <tim.c.chen@...ux.intel.com>,
 Aubrey Li <aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>,
 Chen Yu <yu.chen.surf@...il.com>, linux-kernel@...r.kernel.org,
 Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 K Prateek Nayak <kprateek.nayak@....com>,
 "Gautham R . Shenoy" <gautham.shenoy@....com>
Subject: Re: [RFC PATCH v4 08/28] sched: Set up LLC indexing

On 9/26/2025 9:51 PM, Chen, Yu C wrote:
> Hi Adam,
> 
> On 9/26/2025 2:14 PM, Adam Li wrote:
>> Hi Chen Yu,
>>
>> I tested the patch set on AmpereOne CPU with 192 cores.
>> With certain firmware setting, each core has its own L1/L2 cache.
>> But *no* cores share LLC (L3). So *no* schedule domain
>> has flag 'SD_SHARE_LLC'.
>>
> 
> Good catch! And many thanks for your detailed testing and
> analysis.
> 
> Is this issue triggered with CONFIG_SCHED_CLUSTER disabled?
> 

Yes. With CONFIG_SCHED_CLUSTER enabled this issue will
not be triggered. The maximum sd_llc_idx will be less than MAX_LLC(64)
since we have 24 (192/8) cluster domains.

>> With this topology:
>> per_cpu(sd_llc_id, cpu) is actually the cpu id (0-191).
>>
>> And kernel bug will be triggered at:
>> 'BUG_ON(idx > MAX_LLC)'
>>
> 
> Yes, the sd_llc_idx thing is a bit tricky - we want to use it to
> index into the static array struct sg_lb_stat.nr_pref_llc, and
> we have to limit its range. A better approach would be to
> dynamically allocate the buffer, so we could get rid of the
> 'idx > MAX_LLC' check, but that might complicate the code.
> 
>> Please see details bellow.
>>
>> The bug will disappear if setting 'MAX_LLC' to 192.
>> But I think we might disable CAS(cache aware scheduling)
>> if no domain has 'SD_SHARE_LLC'.
>>
> 
> I agree with you. Simply disabling cache-aware scheduling
> if there is no SD_SHARE_LLC would be simpler.
> 
>> On 8/9/2025 1:03 PM, Chen Yu wrote:
>> A draft patch like bellow can fix the kernel BUG:
>> 1) Do not call update_llc_idx() if domain has no SD_SHARE_LLC
>> 2) Disable CAS if domain has no SD_SHARE_LLC
>>
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index 8483c02b4d28..cde9b6cdb1de 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -704,7 +704,8 @@ static void update_top_cache_domain(int cpu)
>>          per_cpu(sd_llc_size, cpu) = size;
>>          per_cpu(sd_llc_id, cpu) = id;
>>          rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds);
>> -       update_llc_idx(cpu);
>> +       if (sd)
>> +               update_llc_idx(cpu);
>>
> 
> OK, that make sense.
> 
>>          sd = lowest_flag_domain(cpu, SD_CLUSTER);
>>          if (sd)
>> @@ -2476,6 +2477,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>>          int i, ret = -ENOMEM;
>>          bool has_asym = false;
>>          bool has_cluster = false;
>> +       bool has_llc = false;
>>          bool llc_has_parent_sd = false;
>>          unsigned int multi_llcs_node = 1;
>>
>> @@ -2621,6 +2623,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>>
>>                  if (lowest_flag_domain(i, SD_CLUSTER))
>>                          has_cluster = true;
>> +
>> +               if (highest_flag_domain(i, SD_SHARE_LLC))
>> +                       has_llc = true;
>>          }
>>          rcu_read_unlock();
>>
>> @@ -2631,7 +2636,8 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>>                  static_branch_inc_cpuslocked(&sched_cluster_active);
>>
>>   #ifdef CONFIG_SCHED_CACHE
>> -       if (llc_has_parent_sd && multi_llcs_node && !sched_asym_cpucap_active())
>> +       if (has_llc && llc_has_parent_sd && multi_llcs_node &&
> 
> multi_llcs_node will be false if there is no SD_SHARE_LLC domain on the
> platform, so I suppose we don’t have to introduce has_llc?
> multi_llcs is set to true iff there are more than 1 SD_SHARE_LLC domains under its
> SD_SHARE_LLC parent domain.
> 

If there is *no* SD_SHARE_LLC domain, my test shows 'multi_llcs_node' is still 1 (true).

Looks it is because the default value of 'multi_llcs_node' is 1.

build_sched_domains():
	unsigned int multi_llcs_node = 1;

And this condition is always false since we have no SD_SHARE_LLC domain,
therefore 'multi_llcs_node' will not be changed:

                        if (!(sd->flags & SD_SHARE_LLC) && child &&
                            (child->flags & SD_SHARE_LLC))

Thanks,
-adam