linux-kernel - Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160407201156.GC3448@twins.programming.kicks-ass.net>
Date:	Thu, 7 Apr 2016 22:11:56 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...hat.com>,
	Paul Turner <commonly@...il.com>,
	Andi Kleen <andi@...stfloor.org>, Chris Lameter <cl@...ux.com>,
	Dave Watson <davejwatson@...com>,
	Josh Triplett <josh@...htriplett.org>,
	Linux API <linux-api@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Andrew Hunter <ahh@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu
 critical sections

On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote:
> More concretely, this looks like (using totally arbitrary register
> assingments -- probably far from ideal, especially given how GCC's
> constraints work):
> 
> enter the critical section:
> 1:
> movq %[cpu], %%r12
> movq {address of counter for our cpu}, %%r13
> movq {some fresh value}, (%%r13)
> cmpq %[cpu], %%r12
> jne 1b
> 
> ... do whatever setup or computation is needed...
> 
> movq $%l[failed], %%rcx
> movq $1f, %[commit_instr]
> cmpq {whatever counter we chose}, (%%r13)
> jne %l[failed]
> cmpq %[cpu], %%r12
> jne %l[failed]
> 
> <-- a signal in here that conflicts with us would clobber (%%r13), and
> the kernel would notice and send us to the failed label
> 
> movq %[to_write], (%[target])
> 1: movq $0, %[commit_instr]

And the kernel, for every thread that has had the syscall called and a
thingy registered, needs to (at preempt/signal-setup):

	if (get_user(post_commit_ip, current->post_commit_ip))
		return -EFAULT;

	if (likely(!post_commit_ip))
		return 0;

	if (regs->ip >= post_commit_ip)
		return 0;

	if (get_user(seq, (u32 __user *)regs->r13))
		return -EFAULT;

	if (regs->$(which one holds our chosen seq?) == seq) {
		/* nothing changed, do not cancel, proceed to commit. */
		return 0;
	}

	if (put_user(0UL, current->post_commit_ip))
		return -EFAULT;

	regs->ip = regs->rcx;


> In contrast to Paul's scheme, this has two additional (highly
> predictable) branches and requires generation of a seqcount in
> userspace.  In its favor, though, it doesnt need preemption hooks,

Without preemption hooks, how would one thread preempting another at the
above <-- clobber anything and cause the commit to fail?

> it's inherently debuggable, 

It is more debuggable, agreed.

> and it allows multiple independent
> rseq-protected things to coexist without forcing each other to abort.

And the kernel only needs to load the second cacheline if it lands in
the middle of a finish block, which should be manageable overhead I
suppose.

But the userspace chunk is lots slower as it needs to always touch
multiple lines, since the @cpu, @seq and @post_commit_ip all live in
separate lines (although I suppose @cpu and @post_commit_ip could live
in the same).

The finish thing needs 3 registers for:

 - fail ip
 - seq pointer
 - seq value

Which I suppose is possible even on register constrained architectures
like i386.