[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd777222-dbb1-4152-990d-5dad9f7dfffb@intel.com>
Date: Wed, 17 Dec 2025 13:25:24 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>, Peter Zijlstra
<peterz@...radead.org>
CC: Ingo Molnar <mingo@...hat.com>, K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>, Vincent Guittot
<vincent.guittot@...aro.org>, Juri Lelli <juri.lelli@...hat.com>, "Dietmar
Eggemann" <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, "Valentin
Schneider" <vschneid@...hat.com>, Madadi Vineeth Reddy
<vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>, Shrikanth Hegde
<sshegde@...ux.ibm.com>, Jianyong Wu <jianyong.wu@...look.com>, Yangyu Chen
<cyy@...self.name>, Tingyin Duan <tingyin.duan@...il.com>, Vern Hao
<vernhao@...cent.com>, Vern Hao <haoxing990@...il.com>, Len Brown
<len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu
<zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Adam Li
<adamli@...amperecomputing.com>, Aaron Lu <ziqianlu@...edance.com>, Tim Chen
<tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 04/23] sched/cache: Make LLC id continuous
On 12/17/2025 3:53 AM, Tim Chen wrote:
> On Tue, 2025-12-16 at 13:31 +0800, Chen, Yu C wrote:
>> On 12/16/2025 4:49 AM, Tim Chen wrote:
>>> On Tue, 2025-12-09 at 12:58 +0100, Peter Zijlstra wrote:
>>>> On Wed, Dec 03, 2025 at 03:07:23PM -0800, Tim Chen wrote:
>>>>
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index 710ed9943d27..0a3918269906 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -1210,10 +1210,17 @@ __read_mostly unsigned int llc_imb_pct = 20;
>>>>>
>>>>> static int llc_id(int cpu)
>>>>> {
>>>>> + int llc;
>>>>> +
>>>>> if (cpu < 0)
>>>>> return -1;
>>>>>
>>>>> + llc = per_cpu(sd_llc_id, cpu);
>>>>> + /* avoid race with cpu hotplug */
>>>>> + if (unlikely(llc >= max_llcs))
>>>>> + return -1;
>>>>> +
>>>>> + return llc;
>>>>> }
>>>>>
>>>>> void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_sched)
>>>>
>>>>> @@ -668,6 +670,55 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
>>>>> DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
>>>>> DEFINE_STATIC_KEY_FALSE(sched_cluster_active);
>>>>>
>>>>> +/*
>>>>> + * Assign continuous llc id for the CPU, and return
>>>>> + * the assigned llc id.
>>>>> + */
>>>>> +static int update_llc_id(struct sched_domain *sd,
>>>>> + int cpu)
>>>>> +{
>>>>> + int id = per_cpu(sd_llc_id, cpu), i;
>>>>> +
>>>>> + if (id >= 0)
>>>>> + return id;
>>>>> +
>>>>> + if (sd) {
>>>>> + /* Look for any assigned id and reuse it.*/
>>>>> + for_each_cpu(i, sched_domain_span(sd)) {
>>>>> + id = per_cpu(sd_llc_id, i);
>>>>> +
>>>>> + if (id >= 0) {
>>>>> + per_cpu(sd_llc_id, cpu) = id;
>>>>> + return id;
>>>>> + }
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * When 1. there is no id assigned to this LLC domain,
>>>>> + * or 2. the sd is NULL, we reach here.
>>>>> + * Consider the following scenario,
>>>>> + * CPU0~CPU95 are in the node0, CPU96~CPU191 are
>>>>> + * in the node1. During bootup, maxcpus=96 is
>>>>> + * appended.
>>>>> + * case 1: When running cpu_attach_domain(CPU24)
>>>>> + * during boot up, CPU24 is the first CPU in its
>>>>> + * non-NULL LLC domain. However,
>>>>> + * its corresponding llc id has not been assigned yet.
>>>>> + *
>>>>> + * case 2: After boot up, the CPU100 is brought up
>>>>> + * via sysfs manually. As a result, CPU100 has only a
>>>>> + * Numa domain attached, because CPU100 is the only CPU
>>>>> + * of a sched domain, all its bottom domains are degenerated.
>>>>> + * The LLC domain pointer sd is NULL for CPU100.
>>>>> + *
>>>>> + * For both cases, we want to increase the number of LLCs.
>>>>> + */
>>>>> + per_cpu(sd_llc_id, cpu) = max_llcs++;
>>>>> +
>>>>> + return per_cpu(sd_llc_id, cpu);
>>>>> +}
>>>>
>>>> I'm not sure I follow. So partition_sched_domains() first calls
>>>> detach_destroy_domains() on the old set, and then build_sched_domains()
>>>> on the new set.
>>>>
>>>> Do detach_destroy_domain() will do:
>>>>
>>>> cpu_attach_domain(NULL,..);
>>>>
>>>> That is, it will explicitly attach the NULL sched_domain to a CPU. At
>>>> which point I feel update_llc_id() should be returning -1, no?
>>>>
>>>> Then later, build_sched_domains() will set a !NULL sched_domain, at
>>>> which point update_llc_id() can set a real value.
>>>>
>>>> This should then also get rid of that weird max_llcs check in llc_id(),
>>>> right?
>>
>> The check for max_llcs was intended to prevent out-of-bounds access
>> to rq->nr_pref_llc[] at multiple points in the code.
>> Since dst_llc = llc_id(env->dst_cpu); — and while the LLC ID for the
>> CPU is updated in update_llc_id(), this update occurs before we reallocate
>> the nr_pref_llc buffer — dst_llc may end up exceeding the bounds of the
>> original nr_pref_llc buffer.
>>
>> For this reason, we added a check if (dst_llc > max_llc) in llc_id()
>> when attempting to access rq->nr_pref_llc[dst_llc].
>>
>> However, I agree that the max_llc check seems to not properly integrated
>> into the current patch: it should instead be placed in the 7th patch, as
>> this would better illustrate the rationale for the max_llc check here:
>> sched/cache: Introduce per runqueue task LLC preference counter
>>
>> In the 7th patch, we actually increment new_max_llcs rather than
>> max_llcs — meaning max_llcs always represents the "old" number of LLCs.
>> As a result, there is a race window between extending the rq->nr_pref_llc
>> buffer and updating max_llcs.
>>
>>
>> @@ -714,7 +827,7 @@ static int update_llc_id(struct sched_domain *sd,
>> *
>> * For both cases, we want to increase the number of LLCs.
>> */
>> - per_cpu(sd_llc_id, cpu) = max_llcs++;
>> + per_cpu(sd_llc_id, cpu) = new_max_llcs++;
>>
>> return per_cpu(sd_llc_id, cpu);
>> }
>>
>>
>>> Thanks for pointing this out. Yes, we should take care of the
>>> attachment of NULL sd. Will update the code accordingly.
>>>
>>
>> My understanding is that, if the sd is NULL, it is either because invoked
>> by detach_destroy_domain() for the old set, or by case 2 mentioned in
>> above comments:
>> Say, CPU0-CPU95 are online during bootup, the boot command line is
>> maxcpus=96.
>> Later after bootup, the user wants to bring up CPU100, the LLC domain for
>> CPU100 is NULL in this case(due to sd generation), and a new LLC should be
>> detected.
>>
>> That is to say, when we reach update_llc_id(), there could be 2 reasons
>> for NULL sd. For the detach_destroy_domain() case, update_llc_id()
>> should return a valid id without increasing the max_llcs, because of
>> if (id >= 0)
>> return id;
>> And for the latter, the max_llcs should be increased.
>> Let me double check on this.
>
> The issue is we could offline all CPUs in a LLC and online them later.
> In the current code, we will assign their ids all to -1.
I suppose we don't reset the ids in current implementation, only
the first scan of LLCs will reset/initialize the ids to -1 in
build_sched_domains()?
if (!max_llcs) { //max_llcs is initialized to 0 during bootup
for_each_possible_cpu(i)
per_cpu(sd_llc_id, i) = -1;
}
> So on attach
> of CPUs again, we'll be assigning a new LLC. I think the proper thing
> to do is not to assign llc id of the offlined cpu (the case where sd == NULL)
> and keep the original llc id assigned. Then we should be okay and not
> increase max_llcs.
>
This is the current implementation because we don't assign new ids to
CPUs that already have an id(no matter it is offline/online).
thanks,
Chenyu
Powered by blists - more mailing lists