lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20211019150718.GA61531@fuller.cnet>
Date:   Tue, 19 Oct 2021 12:07:18 -0300
From:   Marcelo Tosatti <mtosatti@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Nitesh Lal <nilal@...hat.com>,
        Nicolas Saenz Julienne <nsaenzju@...hat.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Christoph Lameter <cl@...ux.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Alex Belits <abelits@...its.com>, Peter Xu <peterx@...hat.com>
Subject: Re: [patch v4 1/8] add basic task isolation prctl interface

On Wed, Oct 13, 2021 at 05:01:29PM +0200, Peter Zijlstra wrote:
> That's absolutely terrible :/ you're adding extra unconditinal atomics
> to the entry/exit path instead of using the ones that are already there.
> That's no good.
> 
> Also, you're very much not dealing with that race either.

Again, because we haven't seen any use for the notification signal.

But anyway, about the general race:

CPU0                                    CPU1

sys_prctl()
<kernel entry>
                                        if (target_cpu->isolated)
                                          // checks CPU0, not userspace, queues IPI
mark task 'task isolated'
<kernel exit>

		                         arch_send_call_function_ipi_mask()

task is interrupted

----

A possible solution would be to synchronize_rcu (or _expedited if
necessary):


CPU0                                    	CPU1

sys_prctl()
<kernel entry>
						rcu_read_lock()
                                        	if (target_cpu->isolated) {
                                          		// checks CPU0, not userspace, queues IPI
cpu0->isolated = true
synchronize_rcu/synchronize_rcu_expedited

		                         		arch_send_call_function_ipi_mask()
						}
						rcu_read_unlock()

<kernel exit>

----

But that cpu0->isolated variable is not usable for the TLB flush
stuff, for example.

But regarding the question
"the inherently racy nature of some of the don't disturb me stuff":
i think it depends on the case (as in what piece of kernel code).
For example, for the TLB flush case the atomic cmpxchg loop gets rid of the race.

But the above RCU protection might be sufficient for other cases.

And about the notification to userspace, can't see a reason why it can't 
be performed in asynchronous fashion (say via eventfd to a manager
thread rather than isolated threads themselves).

> Also, I think you're broken vs instrumentation, all of this needs to
> happen super early on entry, possibly while still on the entry stack,
> otherwise the missed TLBi might be handled too late and we just used a
> stale TLB entry.

Alright, right after switch to kernel CR3, right before switch to user CR3 
(or as early/late as possible).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ