[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c16481a7-20f1-44b8-981c-fd31cb331cbf@efficios.com>
Date: Mon, 2 Dec 2024 09:34:11 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Gabriele Monaco <gmonaco@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org
Cc: paulmck <paulmck@...nel.org>, Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>, Boqun Feng <boqun.feng@...il.com>,
Uladzislau Rezki <urezki@...il.com>, Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang1211@...il.com>, "rcu@...r.kernel.org"
<rcu@...r.kernel.org>
Subject: Re: [PATCH 2/2] sched: Move task_mm_cid_work to RCU callback
+= CC RCU maintainers, reviewers and list.
+= RSEQ maintainers.
On 2024-12-02 09:07, Gabriele Monaco wrote:
> Currently, the task_mm_cid_work function is called in a task work
> triggered by a scheduler tick. This can delay the execution of the
> task for the entire duration of the function.
>
> This patch runs the task_mm_cid_work in the RCU callback thread rather
> than in the task context before returning to userspace.
>
> The main advantage of this change is that the function can be offloaded
> to a different CPU and even preempted by RT tasks.
>
> On a busy system, this may mean the function gets called less often, but
> the current behaviour already doesn't provide guarantees.
I've used the same task work pattern as NUMA here. What makes it
OK for NUMA and not for mm_cid ?
I wonder why we'd want to piggy-back on call_rcu here when
this has nothing to do with RCU. There is likely a characteristic
of the call_rcu worker threads that we want to import into
task_tick_mm_cid(), or change task_work.c to add a new flag
that says the work can be dispatched to any CPU.
>
> Signed-off-by: Gabriele Monaco <gmonaco@...hat.com>
> ---
> include/linux/sched.h | 1 -
> kernel/sched/core.c | 17 ++++++-----------
> 2 files changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d380bffee2ef..5d141c310917 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1374,7 +1374,6 @@ struct task_struct {
> int last_mm_cid; /* Most recent cid in mm */
> int migrate_from_cpu;
> int mm_cid_active; /* Whether cid bitmap is active */
> - struct callback_head cid_work;
> #endif
>
> struct tlbflush_unmap_batch tlb_ubc;
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 57b50b5952fa..0fc1a972fd4f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -10520,17 +10520,15 @@ static void sched_mm_cid_remote_clear_weight(struct mm_struct *mm, int cpu,
> sched_mm_cid_remote_clear(mm, pcpu_cid, cpu);
> }
>
> -static void task_mm_cid_work(struct callback_head *work)
> +static void task_mm_cid_work(struct rcu_head *rhp)
> {
> unsigned long now = jiffies, old_scan, next_scan;
> - struct task_struct *t = current;
> + struct task_struct *t = container_of(rhp, struct task_struct, rcu);
> struct cpumask *cidmask;
> struct mm_struct *mm;
> int weight, cpu;
>
> - SCHED_WARN_ON(t != container_of(work, struct task_struct, cid_work));
> -
> - work->next = work; /* Prevent double-add */
> + rhp->next = rhp; /* Prevent double-add */
> if (t->flags & PF_EXITING)
> return;
> mm = t->mm;
> @@ -10574,23 +10572,20 @@ void init_sched_mm_cid(struct task_struct *t)
> if (mm_users == 1)
> mm->mm_cid_next_scan = jiffies + msecs_to_jiffies(MM_CID_SCAN_DELAY);
> }
> - t->cid_work.next = &t->cid_work; /* Protect against double add */
> - init_task_work(&t->cid_work, task_mm_cid_work);
> }
>
> void task_tick_mm_cid(struct rq *rq, struct task_struct *curr)
> {
> - struct callback_head *work = &curr->cid_work;
> + struct rcu_head *rhp = &curr->rcu;
Why is it OK to re-use the task struct rcu field ? Where else is it
used, and is there a risk of being inserted twice ?
Thanks,
Mathieu
> unsigned long now = jiffies;
>
> if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) ||
> - work->next != work)
> + rhp->next != rhp)
> return;
> if (time_before(now, READ_ONCE(curr->mm->mm_cid_next_scan)))
> return;
>
> - /* No page allocation under rq lock */
> - task_work_add(curr, work, TWA_RESUME | TWAF_NO_ALLOC);
> + call_rcu(rhp, task_mm_cid_work);
> }
>
> void sched_mm_cid_exit_signals(struct task_struct *t)
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists