lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c9a12d71-faea-4de5-a1b3-f603cf153f91@intel.com>
Date: Tue, 16 Dec 2025 13:31:54 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>, Peter Zijlstra
	<peterz@...radead.org>
CC: Ingo Molnar <mingo@...hat.com>, K Prateek Nayak <kprateek.nayak@....com>,
	"Gautham R . Shenoy" <gautham.shenoy@....com>, Vincent Guittot
	<vincent.guittot@...aro.org>, Juri Lelli <juri.lelli@...hat.com>, "Dietmar
 Eggemann" <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, "Valentin
 Schneider" <vschneid@...hat.com>, Madadi Vineeth Reddy
	<vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>, Shrikanth Hegde
	<sshegde@...ux.ibm.com>, Jianyong Wu <jianyong.wu@...look.com>, Yangyu Chen
	<cyy@...self.name>, Tingyin Duan <tingyin.duan@...il.com>, Vern Hao
	<vernhao@...cent.com>, Vern Hao <haoxing990@...il.com>, Len Brown
	<len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu
	<zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Adam Li
	<adamli@...amperecomputing.com>, Aaron Lu <ziqianlu@...edance.com>, Tim Chen
	<tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 04/23] sched/cache: Make LLC id continuous

On 12/16/2025 4:49 AM, Tim Chen wrote:
> On Tue, 2025-12-09 at 12:58 +0100, Peter Zijlstra wrote:
>> On Wed, Dec 03, 2025 at 03:07:23PM -0800, Tim Chen wrote:
>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 710ed9943d27..0a3918269906 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -1210,10 +1210,17 @@ __read_mostly unsigned int llc_imb_pct            = 20;
>>>   
>>>   static int llc_id(int cpu)
>>>   {
>>> +	int llc;
>>> +
>>>   	if (cpu < 0)
>>>   		return -1;
>>>   
>>> +	llc = per_cpu(sd_llc_id, cpu);
>>> +	/* avoid race with cpu hotplug */
>>> +	if (unlikely(llc >= max_llcs))
>>> +		return -1;
>>> +
>>> +	return llc;
>>>   }
>>>   
>>>   void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_sched)
>>
>>> @@ -668,6 +670,55 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
>>>   DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
>>>   DEFINE_STATIC_KEY_FALSE(sched_cluster_active);
>>>   
>>> +/*
>>> + * Assign continuous llc id for the CPU, and return
>>> + * the assigned llc id.
>>> + */
>>> +static int update_llc_id(struct sched_domain *sd,
>>> +			 int cpu)
>>> +{
>>> +	int id = per_cpu(sd_llc_id, cpu), i;
>>> +
>>> +	if (id >= 0)
>>> +		return id;
>>> +
>>> +	if (sd) {
>>> +		/* Look for any assigned id and reuse it.*/
>>> +		for_each_cpu(i, sched_domain_span(sd)) {
>>> +			id = per_cpu(sd_llc_id, i);
>>> +
>>> +			if (id >= 0) {
>>> +				per_cpu(sd_llc_id, cpu) = id;
>>> +				return id;
>>> +			}
>>> +		}
>>> +	}
>>> +
>>> +	/*
>>> +	 * When 1. there is no id assigned to this LLC domain,
>>> +	 * or 2. the sd is NULL, we reach here.
>>> +	 * Consider the following scenario,
>>> +	 * CPU0~CPU95 are in the node0, CPU96~CPU191 are
>>> +	 * in the node1. During bootup, maxcpus=96 is
>>> +	 * appended.
>>> +	 * case 1: When running cpu_attach_domain(CPU24)
>>> +	 * during boot up, CPU24 is the first CPU in its
>>> +	 * non-NULL LLC domain. However,
>>> +	 * its corresponding llc id has not been assigned yet.
>>> +	 *
>>> +	 * case 2: After boot up, the CPU100 is brought up
>>> +	 * via sysfs manually. As a result, CPU100 has only a
>>> +	 * Numa domain attached, because CPU100 is the only CPU
>>> +	 * of a sched domain, all its bottom domains are degenerated.
>>> +	 * The LLC domain pointer sd is NULL for CPU100.
>>> +	 *
>>> +	 * For both cases, we want to increase the number of LLCs.
>>> +	 */
>>> +	per_cpu(sd_llc_id, cpu) = max_llcs++;
>>> +
>>> +	return per_cpu(sd_llc_id, cpu);
>>> +}
>>
>> I'm not sure I follow. So partition_sched_domains() first calls
>> detach_destroy_domains() on the old set, and then build_sched_domains()
>> on the new set.
>>
>> Do detach_destroy_domain() will do:
>>
>>    cpu_attach_domain(NULL,..);
>>
>> That is, it will explicitly attach the NULL sched_domain to a CPU. At
>> which point I feel update_llc_id() should be returning -1, no?
>>
>> Then later, build_sched_domains() will set a !NULL sched_domain, at
>> which point update_llc_id() can set a real value.
>>
>> This should then also get rid of that weird max_llcs check in llc_id(),
>> right?

The check for max_llcs was intended to prevent out-of-bounds access
to rq->nr_pref_llc[] at multiple points in the code.
Since dst_llc = llc_id(env->dst_cpu); — and while the LLC ID for the
  CPU is updated in update_llc_id(), this update occurs before we reallocate
  the nr_pref_llc buffer — dst_llc may end up exceeding the bounds of the
original nr_pref_llc buffer.

For this reason, we added a check if (dst_llc > max_llc) in llc_id()
when attempting to access rq->nr_pref_llc[dst_llc].

However, I agree that the max_llc check seems to not properly integrated
into  the current patch: it should instead be placed in the 7th patch, as
this would better illustrate the rationale for the max_llc check here:
sched/cache: Introduce per runqueue task LLC preference counter

In the 7th patch, we actually increment new_max_llcs rather than
max_llcs — meaning max_llcs always represents the "old" number of LLCs.
As a result, there is a race window between extending the rq->nr_pref_llc
buffer and updating max_llcs.


@@ -714,7 +827,7 @@ static int update_llc_id(struct sched_domain *sd,
  	 *
  	 * For both cases, we want to increase the number of LLCs.
  	 */
-	per_cpu(sd_llc_id, cpu) = max_llcs++;
+	per_cpu(sd_llc_id, cpu) = new_max_llcs++;

  	return per_cpu(sd_llc_id, cpu);
  }


> Thanks for pointing this out.  Yes, we should take care of the
> attachment of NULL sd. Will update the code accordingly.
> 

My understanding is that, if the sd is NULL, it is either because invoked
by detach_destroy_domain() for the old set, or by case 2 mentioned in 
above comments:
Say, CPU0-CPU95 are online during bootup, the boot command line is 
maxcpus=96.
Later after bootup, the user wants to bring up CPU100, the LLC domain for
CPU100 is NULL in this case(due to sd generation), and a new LLC should be
detected.

That is to say, when we reach update_llc_id(), there could be 2 reasons
for NULL sd. For the detach_destroy_domain() case, update_llc_id()
should return a valid id without increasing the max_llcs, because of
     if (id >= 0)
         return id;
And for the latter, the max_llcs should be increased.
Let me double check on this.

thanks,
Chenyu


> Tim

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ