linux-kernel - Re: [RFC PATCH v7 1/7] Restartable sequences system call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 3 Aug 2016 22:03:32 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Boqun Feng <boqun.feng@...il.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Russell King <linux@....linux.org.uk>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	linux-api <linux-api@...r.kernel.org>,
	Paul Turner <pjt@...gle.com>, Andrew Hunter <ahh@...gle.com>,
	Andi Kleen <andi@...stfloor.org>,
	Dave Watson <davejwatson@...com>, Chris Lameter <cl@...ux.com>,
	Ben Maurer <bmaurer@...com>, rostedt <rostedt@...dmis.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Josh Triplett <josh@...htriplett.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>,
	Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [RFC PATCH v7 1/7] Restartable sequences system call

On Wed, Aug 3, 2016 at 9:27 PM, Boqun Feng <boqun.feng@...il.com> wrote:
> On Wed, Aug 03, 2016 at 09:37:57AM -0700, Andy Lutomirski wrote:
>> On Wed, Aug 3, 2016 at 5:27 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>> > On Tue, Jul 26, 2016 at 03:02:19AM +0000, Mathieu Desnoyers wrote:
>> >> We really care about preemption here. Every migration implies a
>> >> preemption from a user-space perspective. If we would only care
>> >> about keeping the CPU id up-to-date, hooking into migration would be
>> >> enough. But since we want atomicity guarantees for restartable
>> >> sequences, we need to hook into preemption.
>> >
>> >> It allows user-space to perform update operations on per-cpu data without
>> >> requiring heavy-weight atomic operations.
>> >
>> > Well, a CMPXCHG without LOCK prefix isn't all that expensive on x86.
>> >
>> > It is however on PPC and possibly other architectures, so in name of
>> > simplicity supporting only the one variant makes sense.
>> >
>>
>> I wouldn't want to depend on CMPXCHG.  But imagine we had primitives
>> that were narrower than the full abort-on-preemption primitive.
>> Specifically, suppose we had abort if (actual cpu != expected_cpu ||
>> *aptr != aval).  We could do things like:
>>
>> expected_cpu = cpu;
>> aval = NULL;  // disarm for now
>> begin();
>> aval = event_count[cpu] + 1;
>> event_count[cpu] = aval;
>> event_count[cpu]++;
>
> This line is redundant, right? Because it will guarantee a failure even
> in no-contention cases.
>
>>
>> ... compute something ...
>>
>> // arm the rest of it
>> aptr = &event_count[cpu];
>> if (*aptr != aval)
>>   goto fail;
>>
>> *thing_im_writing = value_i_computed;
>> end();
>>
>> The idea here is that we don't rely on the scheduler to increment the
>> event count at all, which means that we get to determine the scope of
>> what kinds of access conflicts we care about ourselves.
>>
>
> If we increase the event count in userspace, how could we prevent two
> userspace threads from racing on the event_count[cpu] field? For
> example:
>
>         CPU 0
>         ================
>         {event_count[0] is initially 0}
>
>         [Thread 1]
>         begin();
>         aval = event_count[cpu] + 1; // 1
>
>         (preempted)
>         [Thread 2]
>         begin();
>         aval = event_count[cpu] + 1; // 1, too
>         event_count[cpu] = aval; // event_count[0] is 1
>

You're right :(  This would work with an xadd instruction, but that's
very slow and doesn't exist on most architectures.  It could also work
if we did:

aval = some_tls_value++;

where some_tls_value is set up such that no two threads could ever end
up with the same values (using high bits as thread ids, perhaps), but
that's messy.  Maybe my idea is no good.