linux-kernel - Re: [PATCH v2 04/23] sched/cache: Make LLC id continuous

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd777222-dbb1-4152-990d-5dad9f7dfffb@intel.com>
Date: Wed, 17 Dec 2025 13:25:24 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>, Peter Zijlstra
	<peterz@...radead.org>
CC: Ingo Molnar <mingo@...hat.com>, K Prateek Nayak <kprateek.nayak@....com>,
	"Gautham R . Shenoy" <gautham.shenoy@....com>, Vincent Guittot
	<vincent.guittot@...aro.org>, Juri Lelli <juri.lelli@...hat.com>, "Dietmar
 Eggemann" <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, "Valentin
 Schneider" <vschneid@...hat.com>, Madadi Vineeth Reddy
	<vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>, Shrikanth Hegde
	<sshegde@...ux.ibm.com>, Jianyong Wu <jianyong.wu@...look.com>, Yangyu Chen
	<cyy@...self.name>, Tingyin Duan <tingyin.duan@...il.com>, Vern Hao
	<vernhao@...cent.com>, Vern Hao <haoxing990@...il.com>, Len Brown
	<len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu
	<zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Adam Li
	<adamli@...amperecomputing.com>, Aaron Lu <ziqianlu@...edance.com>, Tim Chen
	<tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 04/23] sched/cache: Make LLC id continuous

On 12/17/2025 3:53 AM, Tim Chen wrote:
> On Tue, 2025-12-16 at 13:31 +0800, Chen, Yu C wrote:
>> On 12/16/2025 4:49 AM, Tim Chen wrote:
>>> On Tue, 2025-12-09 at 12:58 +0100, Peter Zijlstra wrote:
>>>> On Wed, Dec 03, 2025 at 03:07:23PM -0800, Tim Chen wrote:
>>>>
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index 710ed9943d27..0a3918269906 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -1210,10 +1210,17 @@ __read_mostly unsigned int llc_imb_pct            = 20;
>>>>>    
>>>>>    static int llc_id(int cpu)
>>>>>    {
>>>>> +	int llc;
>>>>> +
>>>>>    	if (cpu < 0)
>>>>>    		return -1;
>>>>>    
>>>>> +	llc = per_cpu(sd_llc_id, cpu);
>>>>> +	/* avoid race with cpu hotplug */
>>>>> +	if (unlikely(llc >= max_llcs))
>>>>> +		return -1;
>>>>> +
>>>>> +	return llc;
>>>>>    }
>>>>>    
>>>>>    void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_sched)
>>>>
>>>>> @@ -668,6 +670,55 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
>>>>>    DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
>>>>>    DEFINE_STATIC_KEY_FALSE(sched_cluster_active);
>>>>>    
>>>>> +/*
>>>>> + * Assign continuous llc id for the CPU, and return
>>>>> + * the assigned llc id.
>>>>> + */
>>>>> +static int update_llc_id(struct sched_domain *sd,
>>>>> +			 int cpu)
>>>>> +{
>>>>> +	int id = per_cpu(sd_llc_id, cpu), i;
>>>>> +
>>>>> +	if (id >= 0)
>>>>> +		return id;
>>>>> +
>>>>> +	if (sd) {
>>>>> +		/* Look for any assigned id and reuse it.*/
>>>>> +		for_each_cpu(i, sched_domain_span(sd)) {
>>>>> +			id = per_cpu(sd_llc_id, i);
>>>>> +
>>>>> +			if (id >= 0) {
>>>>> +				per_cpu(sd_llc_id, cpu) = id;
>>>>> +				return id;
>>>>> +			}
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +	/*
>>>>> +	 * When 1. there is no id assigned to this LLC domain,
>>>>> +	 * or 2. the sd is NULL, we reach here.
>>>>> +	 * Consider the following scenario,
>>>>> +	 * CPU0~CPU95 are in the node0, CPU96~CPU191 are
>>>>> +	 * in the node1. During bootup, maxcpus=96 is
>>>>> +	 * appended.
>>>>> +	 * case 1: When running cpu_attach_domain(CPU24)
>>>>> +	 * during boot up, CPU24 is the first CPU in its
>>>>> +	 * non-NULL LLC domain. However,
>>>>> +	 * its corresponding llc id has not been assigned yet.
>>>>> +	 *
>>>>> +	 * case 2: After boot up, the CPU100 is brought up
>>>>> +	 * via sysfs manually. As a result, CPU100 has only a
>>>>> +	 * Numa domain attached, because CPU100 is the only CPU
>>>>> +	 * of a sched domain, all its bottom domains are degenerated.
>>>>> +	 * The LLC domain pointer sd is NULL for CPU100.
>>>>> +	 *
>>>>> +	 * For both cases, we want to increase the number of LLCs.
>>>>> +	 */
>>>>> +	per_cpu(sd_llc_id, cpu) = max_llcs++;
>>>>> +
>>>>> +	return per_cpu(sd_llc_id, cpu);
>>>>> +}
>>>>
>>>> I'm not sure I follow. So partition_sched_domains() first calls
>>>> detach_destroy_domains() on the old set, and then build_sched_domains()
>>>> on the new set.
>>>>
>>>> Do detach_destroy_domain() will do:
>>>>
>>>>     cpu_attach_domain(NULL,..);
>>>>
>>>> That is, it will explicitly attach the NULL sched_domain to a CPU. At
>>>> which point I feel update_llc_id() should be returning -1, no?
>>>>
>>>> Then later, build_sched_domains() will set a !NULL sched_domain, at
>>>> which point update_llc_id() can set a real value.
>>>>
>>>> This should then also get rid of that weird max_llcs check in llc_id(),
>>>> right?
>>
>> The check for max_llcs was intended to prevent out-of-bounds access
>> to rq->nr_pref_llc[] at multiple points in the code.
>> Since dst_llc = llc_id(env->dst_cpu); — and while the LLC ID for the
>>    CPU is updated in update_llc_id(), this update occurs before we reallocate
>>    the nr_pref_llc buffer — dst_llc may end up exceeding the bounds of the
>> original nr_pref_llc buffer.
>>
>> For this reason, we added a check if (dst_llc > max_llc) in llc_id()
>> when attempting to access rq->nr_pref_llc[dst_llc].
>>
>> However, I agree that the max_llc check seems to not properly integrated
>> into  the current patch: it should instead be placed in the 7th patch, as
>> this would better illustrate the rationale for the max_llc check here:
>> sched/cache: Introduce per runqueue task LLC preference counter
>>
>> In the 7th patch, we actually increment new_max_llcs rather than
>> max_llcs — meaning max_llcs always represents the "old" number of LLCs.
>> As a result, there is a race window between extending the rq->nr_pref_llc
>> buffer and updating max_llcs.
>>
>>
>> @@ -714,7 +827,7 @@ static int update_llc_id(struct sched_domain *sd,
>>    	 *
>>    	 * For both cases, we want to increase the number of LLCs.
>>    	 */
>> -	per_cpu(sd_llc_id, cpu) = max_llcs++;
>> +	per_cpu(sd_llc_id, cpu) = new_max_llcs++;
>>
>>    	return per_cpu(sd_llc_id, cpu);
>>    }
>>
>>
>>> Thanks for pointing this out.  Yes, we should take care of the
>>> attachment of NULL sd. Will update the code accordingly.
>>>
>>
>> My understanding is that, if the sd is NULL, it is either because invoked
>> by detach_destroy_domain() for the old set, or by case 2 mentioned in
>> above comments:
>> Say, CPU0-CPU95 are online during bootup, the boot command line is
>> maxcpus=96.
>> Later after bootup, the user wants to bring up CPU100, the LLC domain for
>> CPU100 is NULL in this case(due to sd generation), and a new LLC should be
>> detected.
>>
>> That is to say, when we reach update_llc_id(), there could be 2 reasons
>> for NULL sd. For the detach_destroy_domain() case, update_llc_id()
>> should return a valid id without increasing the max_llcs, because of
>>       if (id >= 0)
>>           return id;
>> And for the latter, the max_llcs should be increased.
>> Let me double check on this.
> 
> The issue is we could offline all CPUs in a LLC and online them later.
> In the current code, we will assign their ids all to -1.

I suppose we don't reset the ids in current implementation, only
the first scan of LLCs will reset/initialize the ids to -1 in
build_sched_domains()?
         if (!max_llcs) { //max_llcs is initialized to 0 during bootup
                 for_each_possible_cpu(i)
                         per_cpu(sd_llc_id, i) = -1;
         }

> So on attach
> of CPUs again, we'll be assigning a new LLC.  I think the proper thing
> to do is not to assign llc id of the offlined cpu (the case where sd == NULL)
> and keep the original llc id assigned.  Then we should be okay and not
> increase max_llcs.
> 

This is the current implementation because we don't assign new ids to
CPUs that already have an id(no matter it is offline/online).

thanks,
Chenyu