linux-kernel - Re: [PATCH 2/2] sched: Move task_mm_cid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <71f31ba8b26d25e3295a2e99803a11eca4e94788.camel@redhat.com>
Date: Mon, 02 Dec 2024 16:21:28 +0100
From: Gabriele Monaco <gmonaco@...hat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Ingo Molnar	
 <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli	
 <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
 Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
 <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,  Mel Gorman
 <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 linux-kernel@...r.kernel.org
Cc: paulmck <paulmck@...nel.org>, Frederic Weisbecker <frederic@...nel.org>,
  Neeraj Upadhyay <neeraj.upadhyay@...nel.org>, Joel Fernandes
 <joel@...lfernandes.org>, Josh Triplett	 <josh@...htriplett.org>, Boqun
 Feng <boqun.feng@...il.com>, Uladzislau Rezki	 <urezki@...il.com>, Lai
 Jiangshan <jiangshanlai@...il.com>, Zqiang	 <qiang.zhang1211@...il.com>,
 "rcu@...r.kernel.org" <rcu@...r.kernel.org>
Subject: Re: [PATCH 2/2] sched: Move task_mm_cid_work to RCU callback

> I've used the same task work pattern as NUMA here. What makes it
> OK for NUMA and not for mm_cid ?
> 

I didn't investigate the behaviour with the NUMA work, but my rough
guess is that this wouldn't even be visible in an isolated environment
(i.e. no migrations).
Also it doesn't seem to scale linearly with the number of cores.

Your approach (or the NUMA's) isn't wrong, in my opinion, it just
doesn't necessarily require to run in that context.

In an environment with isolated CPUs, we want the lowest latency
possible, that kind of work before switching to userspace imposes a
latency that could simply be elsewhere, even on another core since we
are doing remote accesses.
We are talking about 35us on a rather big system, not many applications
are sensitive to that kind of latency.

> I wonder why we'd want to piggy-back on call_rcu here when
> this has nothing to do with RCU. There is likely a characteristic
> of the call_rcu worker threads that we want to import into
> task_tick_mm_cid(), or change task_work.c to add a new flag
> that says the work can be dispatched to any CPU.
> 

Alright, taking the RCU path was probably a bit lazy, another thought I
had was to run it in a workqueue, perhaps tied to the mm rather than to
the task struct itself. I'm also not entirely sure running it with the
scheduler tick is the best approach, since it doesn't seem quite
predictable, but I didn't really get the full requirements, so a
discussion on this can surely help.

> >   void task_tick_mm_cid(struct rq *rq, struct task_struct *curr)
> >   {
> > - struct callback_head *work = &curr->cid_work;
> > + struct rcu_head *rhp = &curr->rcu;
> 
> Why is it OK to re-use the task struct rcu field ? Where else is it
> used, and is there a risk of being inserted twice ?
> 

The same approach is used in
https://elixir.bootlin.com/linux/v6.12/source/include/linux/sched/task.h#L169
also there it was probably chosen for its simplicity and it isn't the
absolute best approach.
There may be a risk of messing things up, again this was the lazy path
and probably a more robust approach would work better.

Thanks for your comments.
Gabriele