[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbacc6df3b197566e763739d96508aaabf01bfc0.camel@linux.intel.com>
Date: Wed, 10 Dec 2025 10:49:14 -0800
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, K Prateek Nayak
<kprateek.nayak@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>,
Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli
<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, Hillf Danton
<hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>, Jianyong Wu
<jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>, Tingyin Duan
<tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>, Vern Hao
<haoxing990@...il.com>, Len Brown <len.brown@...el.com>, Aubrey Li
<aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>, Chen Yu
<yu.chen.surf@...il.com>, Chen Yu <yu.c.chen@...el.com>, Adam Li
<adamli@...amperecomputing.com>, Aaron Lu <ziqianlu@...edance.com>, Tim
Chen <tim.c.chen@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 07/23] sched/cache: Introduce per runqueue task LLC
preference counter
On Wed, 2025-12-10 at 13:51 +0100, Peter Zijlstra wrote:
> On Wed, Dec 03, 2025 at 03:07:26PM -0800, Tim Chen wrote:
>
> > +static int resize_llc_pref(void)
> > +{
> > + unsigned int *__percpu *tmp_llc_pref;
> > + int i, ret = 0;
> > +
> > + if (new_max_llcs <= max_llcs)
> > + return 0;
> > +
> > + /*
> > + * Allocate temp percpu pointer for old llc_pref,
> > + * which will be released after switching to the
> > + * new buffer.
> > + */
> > + tmp_llc_pref = alloc_percpu_noprof(unsigned int *);
> > + if (!tmp_llc_pref)
> > + return -ENOMEM;
> > +
> > + for_each_present_cpu(i)
> > + *per_cpu_ptr(tmp_llc_pref, i) = NULL;
> > +
> > + /*
> > + * Resize the per rq nr_pref_llc buffer and
> > + * switch to this new buffer.
> > + */
> > + for_each_present_cpu(i) {
> > + struct rq_flags rf;
> > + unsigned int *new;
> > + struct rq *rq;
> > +
> > + rq = cpu_rq(i);
> > + new = alloc_new_pref_llcs(rq->nr_pref_llc, per_cpu_ptr(tmp_llc_pref, i));
> > + if (!new) {
> > + ret = -ENOMEM;
> > +
> > + goto release_old;
> > + }
> > +
> > + /*
> > + * Locking rq ensures that rq->nr_pref_llc values
> > + * don't change with new task enqueue/dequeue
> > + * when we repopulate the newly enlarged array.
> > + */
> > + rq_lock_irqsave(rq, &rf);
> > + populate_new_pref_llcs(rq->nr_pref_llc, new);
> > + rq->nr_pref_llc = new;
> > + rq_unlock_irqrestore(rq, &rf);
> > + }
> > +
> > +release_old:
> > + /*
> > + * Load balance is done under rcu_lock.
> > + * Wait for load balance before and during resizing to
> > + * be done. They may refer to old nr_pref_llc[]
> > + * that hasn't been resized.
> > + */
> > + synchronize_rcu();
> > + for_each_present_cpu(i)
> > + kfree(*per_cpu_ptr(tmp_llc_pref, i));
> > +
> > + free_percpu(tmp_llc_pref);
> > +
> > + /* succeed and update */
> > + if (!ret)
> > + max_llcs = new_max_llcs;
> > +
> > + return ret;
> > +}
>
> Would it perhaps be easier to stick this thing in rq->sd rather than in
> rq->nr_pref_llc. That way it automagically switches with the 'new'
> domain. And then, with a bit of care, a singe load-balance pass should
> see a consistent view (there should not be reloads of rq->sd -- which
> will be a bit of an audit I suppose).
We need nr_pref_llc information at the runqueue level because the load balancer
must identify which specific rq has the largest number of tasks that
prefer a given destination LLC. If we move the counter to the LLC’s sd
level, we would only know the aggregate number of tasks in the entire LLC
that prefer that destination—not which rq they reside on. Without per-rq
counts, we would not be able to select the correct source rq to pull tasks from.
The only way this could work at the LLC-sd level is if all CPUs within
the LLC shared a single runqueue, which is not the case today.
Let me know if I understand your comments correctly.
Tim
Powered by blists - more mailing lists