[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251210165119.GY3707891@noisy.programming.kicks-ass.net>
Date: Wed, 10 Dec 2025 17:51:19 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Ingo Molnar <mingo@...hat.com>,
K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Chen Yu <yu.c.chen@...el.com>, Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
Hillf Danton <hdanton@...a.com>,
Shrikanth Hegde <sshegde@...ux.ibm.com>,
Jianyong Wu <jianyong.wu@...look.com>,
Yangyu Chen <cyy@...self.name>,
Tingyin Duan <tingyin.duan@...il.com>,
Vern Hao <vernhao@...cent.com>, Vern Hao <haoxing990@...il.com>,
Len Brown <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>,
Zhao Liu <zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>,
Adam Li <adamli@...amperecomputing.com>,
Aaron Lu <ziqianlu@...edance.com>, Tim Chen <tim.c.chen@...el.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 17/23] sched/cache: Record the number of active
threads per process for cache-aware scheduling
On Wed, Dec 03, 2025 at 03:07:36PM -0800, Tim Chen wrote:
> From: Chen Yu <yu.c.chen@...el.com>
>
> A performance regression was observed by Prateek when running hackbench
> with many threads per process (high fd count). To avoid this, processes
> with a large number of active threads are excluded from cache-aware
> scheduling.
>
> With sched_cache enabled, record the number of active threads in each
> process during the periodic task_cache_work(). While iterating over
> CPUs, if the currently running task belongs to the same process as the
> task that launched task_cache_work(), increment the active thread count.
>
> This number will be used by subsequent patch to inhibit cache aware
> load balance.
>
> Suggested-by: K Prateek Nayak <kprateek.nayak@....com>
> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> ---
>
> Notes:
> v1->v2: No change.
>
> include/linux/mm_types.h | 1 +
> kernel/sched/fair.c | 11 +++++++++--
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 1ea16ef90566..04743983de4d 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1043,6 +1043,7 @@ struct mm_struct {
> raw_spinlock_t mm_sched_lock;
> unsigned long mm_sched_epoch;
> int mm_sched_cpu;
> + u64 nr_running_avg ____cacheline_aligned_in_smp;
This is unlikely to do what you hope it does, it will place this
variable on a new cacheline, but will not ensure this variable is the
only one in that line. Notably ogtables_bytes (the next field in this
structure) will share the line.
It might all be less dodgy if you stick these here fields in their own
structure, a little like mm_mm_cid or so.
> #endif
>
> #ifdef CONFIG_MMU
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 580a967efdac..2f38ad82688f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1421,11 +1421,11 @@ static void task_tick_cache(struct rq *rq, struct task_struct *p)
>
> static void __no_profile task_cache_work(struct callback_head *work)
> {
> - struct task_struct *p = current;
> + struct task_struct *p = current, *cur;
> struct mm_struct *mm = p->mm;
> unsigned long m_a_occ = 0;
> unsigned long curr_m_a_occ = 0;
> - int cpu, m_a_cpu = -1;
> + int cpu, m_a_cpu = -1, nr_running = 0;
> cpumask_var_t cpus;
>
> WARN_ON_ONCE(work != &p->cache_work);
> @@ -1458,6 +1458,12 @@ static void __no_profile task_cache_work(struct callback_head *work)
> m_occ = occ;
> m_cpu = i;
> }
guard(rcu)();
> + rcu_read_lock();
> + cur = rcu_dereference(cpu_rq(i)->curr);
> + if (cur && !(cur->flags & (PF_EXITING | PF_KTHREAD)) &&
> + cur->mm == mm)
> + nr_running++;
> + rcu_read_unlock();
> }
>
> /*
> @@ -1501,6 +1507,7 @@ static void __no_profile task_cache_work(struct callback_head *work)
> mm->mm_sched_cpu = m_a_cpu;
> }
>
> + update_avg(&mm->nr_running_avg, nr_running);
> free_cpumask_var(cpus);
> }
Its a wee bit weird to introduce nr_running_avg without its user. Makes
it hard to see what's what.
Powered by blists - more mailing lists