lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1918884375.7403.1470850424697.JavaMail.zimbra@efficios.com>
Date:	Wed, 10 Aug 2016 17:33:44 +0000 (UTC)
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Boqun Feng <boqun.feng@...il.com>
Cc:	Andy Lutomirski <luto@...capital.net>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Russell King <linux@....linux.org.uk>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-api <linux-api@...r.kernel.org>,
	Paul Turner <pjt@...gle.com>, Andrew Hunter <ahh@...gle.com>,
	Andi Kleen <andi@...stfloor.org>,
	Dave Watson <davejwatson@...com>, Chris Lameter <cl@...ux.com>,
	Ben Maurer <bmaurer@...com>, rostedt <rostedt@...dmis.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Josh Triplett <josh@...htriplett.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>,
	Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [RFC PATCH v7 1/7] Restartable sequences system call

----- On Aug 9, 2016, at 12:13 PM, Boqun Feng boqun.feng@...il.com wrote:

<snip>

> 
> However, I'm thinking maybe we can use some tricks to avoid unnecessary
> aborts-on-preemption.
> 
> First of all, I notice we haven't make any constraint on what kind of
> memory objects could be "protected" by rseq critical sections yet. And I
> think this is something we should decide before adding this feature into
> kernel.
> 
> We can do some optimization if we have some constraints. For example, if
> the memory objects inside the rseq critical sections could only be
> modified by userspace programs, we therefore don't need to abort
> immediately when userspace task -> kernel task context switch.

The rseq_owner per-cpu variable and rseq_cpu field in task_struct you
propose below would indeed take care of this scenario.

> 
> Further more, if the memory objects inside the rseq critical sections
> could only be modified by userspace programs that have registered their
> rseq structures, we don't need to abort immediately between the context
> switches between two rseq-unregistered tasks or one rseq-registered
> task and one rseq-unregistered task.
> 
> Instead, we do tricks as follow:
> 
> defining a percpu pointer in kernel:
> 
> DEFINE_PER_CPU(struct task_struct *, rseq_owner);
> 
> and a cpu field in struct task_struct:
> 
>	struct task_struct {
>	...
>	#ifdef CONFIG_RSEQ
>		struct rseq __user *rseq;
>		uint32_t rseq_event_counter;
>		int rseq_cpu;
>	#endif
>	...
>	};
> 
> (task_struct::rseq_cpu should be initialized as -1.)
> 
> each time at sched out(in rseq_sched_out()), we do something like:
> 
>	if (prev->rseq) {
>		raw_cpu_write(rseq_owner, prev);
>		prev->rseq_cpu = smp_processor_id();
>	}
> 
> each time sched in(in rseq_handle_notify_resume()), we do something
> like:
> 
>	if (current->rseq &&
>	    (this_cpu_read(rseq_owner) != current ||
>	     current->rseq_cpu != smp_processor_id()))
>		__rseq_handle_notify_resume(regs);
> 
> (Also need to modify rseq_signal_deliver() to call
> __rseq_handle_notify_resume() directly).
> 
> 
> I think this could save some unnecessary aborts-on-preemption, however,
> TBH, I'm too sleepy to verify every corner case. Will recheck this
> tomorrow.

This adds extra fields to the task struct, per-cpu rseq_owner pointers,
and hooks into sched_in which are not needed otherwise, all this to
eliminate unneeded abort-on-preemption.

If we look at the single-stepping use-case, this means that gdb would
only be able to single-step applications as long as neither itself, nor
any of its libraries, use rseq. This seems to be quite fragile. I prefer
requiring rseq users to implement a fallback to locking which progresses
in every situation rather than adding complexity and overhead trying
lessen the odds of triggering the restart.

Simply lessening the odds of triggering the restart without a design that
ensures progress even in restart cases seems to make the lack-of-progress
problem just harder to debug when it will surface in real life.

Thanks,

Mathieu

> 
> Regards,
> Boqun

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ