linux-kernel - Re: [RFC PATCH v7 1/7] Restartable sequences system call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1918884375.7403.1470850424697.JavaMail.zimbra@efficios.com>
Date:	Wed, 10 Aug 2016 17:33:44 +0000 (UTC)
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Boqun Feng <boqun.feng@...il.com>
Cc:	Andy Lutomirski <luto@...capital.net>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Russell King <linux@....linux.org.uk>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-api <linux-api@...r.kernel.org>,
	Paul Turner <pjt@...gle.com>, Andrew Hunter <ahh@...gle.com>,
	Andi Kleen <andi@...stfloor.org>,
	Dave Watson <davejwatson@...com>, Chris Lameter <cl@...ux.com>,
	Ben Maurer <bmaurer@...com>, rostedt <rostedt@...dmis.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Josh Triplett <josh@...htriplett.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>,
	Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [RFC PATCH v7 1/7] Restartable sequences system call

----- On Aug 9, 2016, at 12:13 PM, Boqun Feng boqun.feng@...il.com wrote:

<snip>

> 
> However, I'm thinking maybe we can use some tricks to avoid unnecessary
> aborts-on-preemption.
> 
> First of all, I notice we haven't make any constraint on what kind of
> memory objects could be "protected" by rseq critical sections yet. And I
> think this is something we should decide before adding this feature into
> kernel.
> 
> We can do some optimization if we have some constraints. For example, if
> the memory objects inside the rseq critical sections could only be
> modified by userspace programs, we therefore don't need to abort
> immediately when userspace task -> kernel task context switch.

The rseq_owner per-cpu variable and rseq_cpu field in task_struct you
propose below would indeed take care of this scenario.

> 
> Further more, if the memory objects inside the rseq critical sections
> could only be modified by userspace programs that have registered their
> rseq structures, we don't need to abort immediately between the context
> switches between two rseq-unregistered tasks or one rseq-registered
> task and one rseq-unregistered task.
> 
> Instead, we do tricks as follow:
> 
> defining a percpu pointer in kernel:
> 
> DEFINE_PER_CPU(struct task_struct *, rseq_owner);
> 
> and a cpu field in struct task_struct:
> 
>	struct task_struct {
>	...
>	#ifdef CONFIG_RSEQ
>		struct rseq __user *rseq;
>		uint32_t rseq_event_counter;
>		int rseq_cpu;
>	#endif
>	...
>	};
> 
> (task_struct::rseq_cpu should be initialized as -1.)
> 
> each time at sched out(in rseq_sched_out()), we do something like:
> 
>	if (prev->rseq) {
>		raw_cpu_write(rseq_owner, prev);
>		prev->rseq_cpu = smp_processor_id();
>	}
> 
> each time sched in(in rseq_handle_notify_resume()), we do something
> like:
> 
>	if (current->rseq &&
>	    (this_cpu_read(rseq_owner) != current ||
>	     current->rseq_cpu != smp_processor_id()))
>		__rseq_handle_notify_resume(regs);
> 
> (Also need to modify rseq_signal_deliver() to call
> __rseq_handle_notify_resume() directly).
> 
> 
> I think this could save some unnecessary aborts-on-preemption, however,
> TBH, I'm too sleepy to verify every corner case. Will recheck this
> tomorrow.

This adds extra fields to the task struct, per-cpu rseq_owner pointers,
and hooks into sched_in which are not needed otherwise, all this to
eliminate unneeded abort-on-preemption.

If we look at the single-stepping use-case, this means that gdb would
only be able to single-step applications as long as neither itself, nor
any of its libraries, use rseq. This seems to be quite fragile. I prefer
requiring rseq users to implement a fallback to locking which progresses
in every situation rather than adding complexity and overhead trying
lessen the odds of triggering the restart.

Simply lessening the odds of triggering the restart without a design that
ensures progress even in restart cases seems to make the lack-of-progress
problem just harder to debug when it will surface in real life.

Thanks,

Mathieu

> 
> Regards,
> Boqun

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com