[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVLi4Wdca7r_3xVCOJENFi9U7aPa_Ru303muRbEttufRA@mail.gmail.com>
Date: Wed, 10 Aug 2016 01:13:42 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Boqun Feng <boqun.feng@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Russell King <linux@....linux.org.uk>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
linux-api <linux-api@...r.kernel.org>,
Paul Turner <pjt@...gle.com>, Andrew Hunter <ahh@...gle.com>,
Andi Kleen <andi@...stfloor.org>,
Dave Watson <davejwatson@...com>, Chris Lameter <cl@...ux.com>,
Ben Maurer <bmaurer@...com>, rostedt <rostedt@...dmis.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Josh Triplett <josh@...htriplett.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [RFC PATCH v7 1/7] Restartable sequences system call
On Wed, Aug 3, 2016 at 10:03 PM, Andy Lutomirski <luto@...capital.net> wrote:
> On Wed, Aug 3, 2016 at 9:27 PM, Boqun Feng <boqun.feng@...il.com> wrote:
>> On Wed, Aug 03, 2016 at 09:37:57AM -0700, Andy Lutomirski wrote:
>>> On Wed, Aug 3, 2016 at 5:27 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>>> > On Tue, Jul 26, 2016 at 03:02:19AM +0000, Mathieu Desnoyers wrote:
>>> >> We really care about preemption here. Every migration implies a
>>> >> preemption from a user-space perspective. If we would only care
>>> >> about keeping the CPU id up-to-date, hooking into migration would be
>>> >> enough. But since we want atomicity guarantees for restartable
>>> >> sequences, we need to hook into preemption.
>>> >
>>> >> It allows user-space to perform update operations on per-cpu data without
>>> >> requiring heavy-weight atomic operations.
>>> >
>>> > Well, a CMPXCHG without LOCK prefix isn't all that expensive on x86.
>>> >
>>> > It is however on PPC and possibly other architectures, so in name of
>>> > simplicity supporting only the one variant makes sense.
>>> >
>>>
>>> I wouldn't want to depend on CMPXCHG. But imagine we had primitives
>>> that were narrower than the full abort-on-preemption primitive.
>>> Specifically, suppose we had abort if (actual cpu != expected_cpu ||
>>> *aptr != aval). We could do things like:
>>>
>>> expected_cpu = cpu;
>>> aval = NULL; // disarm for now
>>> begin();
>>> aval = event_count[cpu] + 1;
>>> event_count[cpu] = aval;
>>> event_count[cpu]++;
>>
>> This line is redundant, right? Because it will guarantee a failure even
>> in no-contention cases.
>>
>>>
>>> ... compute something ...
>>>
>>> // arm the rest of it
>>> aptr = &event_count[cpu];
>>> if (*aptr != aval)
>>> goto fail;
>>>
>>> *thing_im_writing = value_i_computed;
>>> end();
>>>
>>> The idea here is that we don't rely on the scheduler to increment the
>>> event count at all, which means that we get to determine the scope of
>>> what kinds of access conflicts we care about ourselves.
>>>
>>
>> If we increase the event count in userspace, how could we prevent two
>> userspace threads from racing on the event_count[cpu] field? For
>> example:
>>
>> CPU 0
>> ================
>> {event_count[0] is initially 0}
>>
>> [Thread 1]
>> begin();
>> aval = event_count[cpu] + 1; // 1
>>
>> (preempted)
>> [Thread 2]
>> begin();
>> aval = event_count[cpu] + 1; // 1, too
>> event_count[cpu] = aval; // event_count[0] is 1
>>
>
> You're right :( This would work with an xadd instruction, but that's
> very slow and doesn't exist on most architectures. It could also work
> if we did:
Thinking about this slightly more, maybe it does work. We could use
basically the same mechanism to allow the kernel to restart if the
specific sequence:
aval = event_count[cpu] + 1
event_count[cpu] = avall
gets preempted by setting aptr = &event_count[cpu] and aval to
event_count[cpu], like this (although I might have screwed up any
number of small details):
aptr = &event_count[cpu];
barrier();
aval = event_count[cpu];
barrier();
tmp = aval + 1;
event_count[cpu] = tmp;
/* preemption here will cause an unnecessary retry, but that's okay */
aval = tmp;
--Andy
Powered by blists - more mailing lists