[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a0d52a5-5c28-498a-8df7-789f020e36ed@paulmck-laptop>
Date: Fri, 27 Oct 2023 14:23:56 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Frederic Weisbecker <frederic@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Boqun Feng <boqun.feng@...il.com>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Neeraj Upadhyay <neeraj.upadhyay@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Uladzislau Rezki <urezki@...il.com>, rcu <rcu@...r.kernel.org>,
Zqiang <qiang.zhang1211@...il.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>
Subject: Re: [PATCH 2/4] rcu/tasks: Handle new PF_IDLE semantics
On Fri, Oct 27, 2023 at 09:20:26PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 27, 2023 at 04:40:48PM +0200, Frederic Weisbecker wrote:
>
> > + /* Has the task been seen voluntarily sleeping? */
> > + if (!READ_ONCE(t->on_rq))
> > + return false;
>
> > - if (t != current && READ_ONCE(t->on_rq) && !is_idle_task(t)) {
>
> AFAICT this ->on_rq usage is outside of scheduler locks and that
> READ_ONCE isn't going to help much.
>
> Obviously a pre-existing issue, and I suppose all it cares about is
> seeing a 0 or not, irrespective of the races, but urgh..
The trick is that RCU Tasks only needs to spot a task voluntarily blocked
once at any point in the grace period. The beginning and end of the
grace-period process have full barriers, so if this code sees t->on_rq
equal to zero, we know that the task was voluntarily blocked at some
point during the grace period, as required.
In theory, we could acquire a scheduler lock, but in practice this would
cause CPU-latency problems at a certain set of large datacenters, and
for once, not the datacenters operated by my employer.
In theory, we could make separate lists of tasks that we need to wait on,
thus avoiding the need to scan the full task list, but in practice this
would require a synchronized linked-list operation on every voluntary
context switch, both in and out.
In theory, the task list could sharded, so that it could be scanned
incrementally, but in practice, this is a bit non-trivial. Though this
particular use case doesn't care about new tasks, so it could live with
something simpler than would be required for certain types of signal
delivery.
In theory, we could place rcu_segcblist-like mid pointers into the
task list, so that scans could restart from any mid pointer. Care is
required because the mid pointers would likely need to be recycled as
new tasks are added. Plus care is needed because it has been a good
long time since I have looked at the code managing the tasks list,
and I am probably woefully out of date on how it all works.
So, is there a better way?
Thanx, Paul
Powered by blists - more mailing lists