[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20211019150718.GA61531@fuller.cnet>
Date: Tue, 19 Oct 2021 12:07:18 -0300
From: Marcelo Tosatti <mtosatti@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Nitesh Lal <nilal@...hat.com>,
Nicolas Saenz Julienne <nsaenzju@...hat.com>,
Frederic Weisbecker <frederic@...nel.org>,
Christoph Lameter <cl@...ux.com>,
Juri Lelli <juri.lelli@...hat.com>,
Alex Belits <abelits@...its.com>, Peter Xu <peterx@...hat.com>
Subject: Re: [patch v4 1/8] add basic task isolation prctl interface
On Wed, Oct 13, 2021 at 05:01:29PM +0200, Peter Zijlstra wrote:
> That's absolutely terrible :/ you're adding extra unconditinal atomics
> to the entry/exit path instead of using the ones that are already there.
> That's no good.
>
> Also, you're very much not dealing with that race either.
Again, because we haven't seen any use for the notification signal.
But anyway, about the general race:
CPU0 CPU1
sys_prctl()
<kernel entry>
if (target_cpu->isolated)
// checks CPU0, not userspace, queues IPI
mark task 'task isolated'
<kernel exit>
arch_send_call_function_ipi_mask()
task is interrupted
----
A possible solution would be to synchronize_rcu (or _expedited if
necessary):
CPU0 CPU1
sys_prctl()
<kernel entry>
rcu_read_lock()
if (target_cpu->isolated) {
// checks CPU0, not userspace, queues IPI
cpu0->isolated = true
synchronize_rcu/synchronize_rcu_expedited
arch_send_call_function_ipi_mask()
}
rcu_read_unlock()
<kernel exit>
----
But that cpu0->isolated variable is not usable for the TLB flush
stuff, for example.
But regarding the question
"the inherently racy nature of some of the don't disturb me stuff":
i think it depends on the case (as in what piece of kernel code).
For example, for the TLB flush case the atomic cmpxchg loop gets rid of the race.
But the above RCU protection might be sufficient for other cases.
And about the notification to userspace, can't see a reason why it can't
be performed in asynchronous fashion (say via eventfd to a manager
thread rather than isolated threads themselves).
> Also, I think you're broken vs instrumentation, all of this needs to
> happen super early on entry, possibly while still on the entry stack,
> otherwise the missed TLBi might be handled too late and we just used a
> stale TLB entry.
Alright, right after switch to kernel CR3, right before switch to user CR3
(or as early/late as possible).
Powered by blists - more mailing lists