linux-kernel - Re: [PATCH v2 04/23] sched/cache: Make LLC id continuous

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b4be0d16-8f2e-4184-9854-a5d1e6415373@intel.com>
Date: Wed, 24 Dec 2025 17:46:12 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: K Prateek Nayak <kprateek.nayak@....com>
CC: Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann
	<dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall
	<bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
	<vschneid@...hat.com>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, "Hillf
 Danton" <hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>,
	"Jianyong Wu" <jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>,
	Tingyin Duan <tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>, Vern
 Hao <haoxing990@...il.com>, Len Brown <len.brown@...el.com>, Aubrey Li
	<aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>, Chen Yu
	<yu.chen.surf@...il.com>, Ingo Molnar <mingo@...hat.com>, Adam Li
	<adamli@...amperecomputing.com>, Aaron Lu <ziqianlu@...edance.com>, Tim Chen
	<tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>, Vincent Guittot
	<vincent.guittot@...aro.org>, Peter Zijlstra <peterz@...radead.org>, "Gautham
 R . Shenoy" <gautham.shenoy@....com>, Tim Chen <tim.c.chen@...ux.intel.com>
Subject: Re: [PATCH v2 04/23] sched/cache: Make LLC id continuous

On 12/24/2025 4:19 PM, K Prateek Nayak wrote:
> Hello Chenyu,
> 
> On 12/24/2025 12:38 PM, Chen, Yu C wrote:
>> Hello Prateek,
>>
>> On 12/23/2025 1:31 PM, K Prateek Nayak wrote:
>>> Hello Tim, Chenyu,
>>>
>>> On 12/4/2025 4:37 AM, Tim Chen wrote:

[snip]

>> I'm OK with replacing the domain based cpumask by the topology_level
>> mask, just wondering whether re-using the llc_id would increase
>> the risk of race condition - it is possible that, a CPU has different
>> llc_ids before/after online/offline. Can we assign/reserve a "static"
>> llc_id for each CPU, whether it is online or offline? In this way,
>> we don't need to worry about the data synchronization when using
>> llc_id(). For example, I can think of adjusting the data in
>> percpu nr_pref_llc[max_llcs] on every CPU whenever a CPU gets
>> offline/online.
> 
> So I was thinking of of expanding the rq->nr_pref_llc[] if the
> max_llc increases but leave it as is if the number of LLCs
> decreases. That way we don't have to worry about the
> dereferencing past the array boundary.
> 

Sure, we can do in this way.

> We can also have a wrapper like:
> 
>      struct nr_llc_stats {
>          int		nr_llcs;
>          struct rcu_head rcu;
>          int 		*nr_pref_llc;
>      }
> 
> And re-allocate and attach it in rq_attach_root() during sd
> rebuild. That way, RCU read-side can always grab a reference to
> it, enqueue / dequeue don't need to care since it cannot change
> under rq_lock, and partition can use call_rcu() to free the old
> ones up.
> 

OK, doing it in this direction(and Peter also suggested something like this
in the domain)

>>
>>>            cpuset_update_active_cpus();
>>>        } else {

[snip]

>>> AFAICT, "sd_llc_id" isn't compared across different partitions so having
>>> the CPUs that are actually associated with same physical LLC but across
>>> different partitions sharing the same "sd_llc_id" shouldn't be a problem.
>>>
>>> Thoughts?
>>>
>>
>> This means cpus_share_resources(int this_cpu, int that_cpu)

Actually I was about to say cpus_share_cache().

>>   should be invoked when this_cpu and that_cpu belong to the same partition.
>> In this way, we do not alter the context of cpus_share_resources(). We can
>> conduct an audit of the places where cpus_share_resources() is used.
> 
> Only case I can think of is a task wakes up after partitioning
> and it's wake cpu from a different partition is mistaken to
> share the LLC as the current CPU - but the task cannot actually
> run on that old CPU and it'll have to take the
> select_fallback_rq() path if prev_cpu was selected during
> wake_affine().
> 

OK, make sense.
Actually, prev_cpu might not be chosen by select_task_rq_fair()->
select_idle_sibling(), because fast path select_idle_sibling()
  is expected to be triggered when prev_cpu and the current cpu are in the
same domain in select_task_rq_fair():
cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))
	sd = NULL; //wake affine
curr cpu and prev_cpu are in different partitions, they
are not in the same domains.

> I don't think it will be such a common occurence to cause an
> issue and even without that wake_affine() could still the
> prev_cpu if current CPU is busy or via wake_affine_weight().
> 

I realized that sched_cache has added cpus_share_cache() in
several places, most of which should be related to load
balancing, which should not be a problem if llc_id is shared
among partitions. I'll double check.

thanks,
Chenyu