linux-kernel - Re: [RFC PATCH v7 1/7] Restartable sequences system call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrVLi4Wdca7r_3xVCOJENFi9U7aPa_Ru303muRbEttufRA@mail.gmail.com>
Date:	Wed, 10 Aug 2016 01:13:42 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Boqun Feng <boqun.feng@...il.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Russell King <linux@....linux.org.uk>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	linux-api <linux-api@...r.kernel.org>,
	Paul Turner <pjt@...gle.com>, Andrew Hunter <ahh@...gle.com>,
	Andi Kleen <andi@...stfloor.org>,
	Dave Watson <davejwatson@...com>, Chris Lameter <cl@...ux.com>,
	Ben Maurer <bmaurer@...com>, rostedt <rostedt@...dmis.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Josh Triplett <josh@...htriplett.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>,
	Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [RFC PATCH v7 1/7] Restartable sequences system call

On Wed, Aug 3, 2016 at 10:03 PM, Andy Lutomirski <luto@...capital.net> wrote:
> On Wed, Aug 3, 2016 at 9:27 PM, Boqun Feng <boqun.feng@...il.com> wrote:
>> On Wed, Aug 03, 2016 at 09:37:57AM -0700, Andy Lutomirski wrote:
>>> On Wed, Aug 3, 2016 at 5:27 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>>> > On Tue, Jul 26, 2016 at 03:02:19AM +0000, Mathieu Desnoyers wrote:
>>> >> We really care about preemption here. Every migration implies a
>>> >> preemption from a user-space perspective. If we would only care
>>> >> about keeping the CPU id up-to-date, hooking into migration would be
>>> >> enough. But since we want atomicity guarantees for restartable
>>> >> sequences, we need to hook into preemption.
>>> >
>>> >> It allows user-space to perform update operations on per-cpu data without
>>> >> requiring heavy-weight atomic operations.
>>> >
>>> > Well, a CMPXCHG without LOCK prefix isn't all that expensive on x86.
>>> >
>>> > It is however on PPC and possibly other architectures, so in name of
>>> > simplicity supporting only the one variant makes sense.
>>> >
>>>
>>> I wouldn't want to depend on CMPXCHG.  But imagine we had primitives
>>> that were narrower than the full abort-on-preemption primitive.
>>> Specifically, suppose we had abort if (actual cpu != expected_cpu ||
>>> *aptr != aval).  We could do things like:
>>>
>>> expected_cpu = cpu;
>>> aval = NULL;  // disarm for now
>>> begin();
>>> aval = event_count[cpu] + 1;
>>> event_count[cpu] = aval;
>>> event_count[cpu]++;
>>
>> This line is redundant, right? Because it will guarantee a failure even
>> in no-contention cases.
>>
>>>
>>> ... compute something ...
>>>
>>> // arm the rest of it
>>> aptr = &event_count[cpu];
>>> if (*aptr != aval)
>>>   goto fail;
>>>
>>> *thing_im_writing = value_i_computed;
>>> end();
>>>
>>> The idea here is that we don't rely on the scheduler to increment the
>>> event count at all, which means that we get to determine the scope of
>>> what kinds of access conflicts we care about ourselves.
>>>
>>
>> If we increase the event count in userspace, how could we prevent two
>> userspace threads from racing on the event_count[cpu] field? For
>> example:
>>
>>         CPU 0
>>         ================
>>         {event_count[0] is initially 0}
>>
>>         [Thread 1]
>>         begin();
>>         aval = event_count[cpu] + 1; // 1
>>
>>         (preempted)
>>         [Thread 2]
>>         begin();
>>         aval = event_count[cpu] + 1; // 1, too
>>         event_count[cpu] = aval; // event_count[0] is 1
>>
>
> You're right :(  This would work with an xadd instruction, but that's
> very slow and doesn't exist on most architectures.  It could also work
> if we did:

Thinking about this slightly more, maybe it does work.  We could use
basically the same mechanism to allow the kernel to restart if the
specific sequence:

aval = event_count[cpu] + 1
event_count[cpu] = avall

gets preempted by setting aptr = &event_count[cpu] and aval to
event_count[cpu], like this (although I might have screwed up any
number of small details):

aptr = &event_count[cpu];
barrier();
aval = event_count[cpu];
barrier();
tmp = aval + 1;
event_count[cpu] = tmp;
/* preemption here will cause an unnecessary retry, but that's okay */
aval = tmp;

--Andy