linux-kernel - Re: [RFC PATCH v2] membarrier: expedited private command

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1551913097.355.1501529479848.JavaMail.zimbra@efficios.com>
Date:   Mon, 31 Jul 2017 19:31:19 +0000 (UTC)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Nicholas Piggin <npiggin@...il.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Andrew Hunter <ahh@...gle.com>,
        maged michael <maged.michael@...il.com>,
        gromer <gromer@...gle.com>, Avi Kivity <avi@...lladb.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Palmer Dabbelt <palmer@...belt.com>,
        Dave Watson <davejwatson@...com>
Subject: Re: [RFC PATCH v2] membarrier: expedited private command

----- On Jul 28, 2017, at 9:58 PM, Nicholas Piggin npiggin@...il.com wrote:

> On Fri, 28 Jul 2017 17:06:53 +0000 (UTC)
> Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> 
>> ----- On Jul 28, 2017, at 12:46 PM, Peter Zijlstra peterz@...radead.org wrote:
>> 
>> > On Fri, Jul 28, 2017 at 03:38:15PM +0000, Mathieu Desnoyers wrote:
>> >> > Which only leaves PPC stranded.. but the 'good' news is that mpe says
>> >> > they'll probably need a barrier in switch_mm() in any case.
>> >> 
>> >> As I pointed out in my other email, I plan to do this:
>> >> 
>> >> --- a/kernel/sched/core.c
>> >> +++ b/kernel/sched/core.c
>> >> @@ -2636,6 +2636,11 @@ static struct rq *finish_task_switch(struct task_struct
>> >> *prev)
>> >>         vtime_task_switch(prev);
>> >>         perf_event_task_sched_in(prev, current);
>> > 
>> > Here would place it _inside_ the rq->lock, which seems to make more
>> > sense given the purpose of the barrier, but either way works given its
>> > definition.
>> 
>> Given its naming "...after_unlock_lock", I thought it would be clearer to put
>> it after the unlock. Anyway, this barrier does not seem to be used to ensure
>> the release barrier per se (unlock already has release semantic), but rather
>> ensures a full memory barrier wrt memory accesses that are synchronized by
>> means other than this this lock.
>> 
>> >   
>> >>         finish_lock_switch(rq, prev);
>> > 
>> > You could put the whole thing inside IS_ENABLED(CONFIG_SYSMEMBARRIER) or
>> > something.
>> 
>> I'm tempted to wait until we hear from powerpc maintainers, so we learn
>> whether they deeply care about this extra barrier in finish_task_switch()
>> before making it conditional on CONFIG_MEMBARRIER.
>> 
>> Having a guaranteed barrier after context switch on all architectures may
>> have other uses.
> 
> I haven't had time to read the thread and understand exactly why you need
> this extra barrier, I'll do it next week. Thanks for cc'ing us on it.
> 
> A smp_mb is pretty expensive on powerpc CPUs. Removing the sync from
> switch_to increased thread switch performance by 2-3%. Putting it in
> switch_mm may be a little less painful, but still we have to weigh it
> against the benefit of this new functionality. Would that be a net win
> for the average end-user? Seems unlikely.
> 
> But we also don't want to lose sys_membarrier completely. Would it be too
> painful to make  MEMBARRIER_CMD_PRIVATE_EXPEDITED return error, or make it
> fall back to a slower case if we decide not to implement it?

The need for an expedited membarrier comes from a need to use it to implement
synchronization schemes like hazard pointers, RCU, and garbage collectors in
user-space. One example is the use-case of hazard pointers. If the memory
free is implemented in the same thread doing the retire, the slowdown
introduced by non-expedited membarrier is not acceptable at all. In that case,
only an expedited membarrier brings an acceptable slowdown. The user's
alternative currently is to rely on undocumented side-effects of mprotect()
to achieve the same result. This happens to work on some architectures, and
may break in the future.

If users do not have membarrier expedited on a given architecture, and are
told that mprotect() does not provide the barrier guarantees they are looking
for, then they would have to add heavy-weight memory barriers on many
user-space fast-paths on those specific architectures, assuming they are
willing to go through that trouble.

I understand that the 2-3% overhead when switching between threads is a big
deal. Do you have numbers on the overhead added by a memory barrier in
switch_mm ? I suspect that switching between processes (including the
cost of following cache line and TLB misses) will be quite heavier in
the first place.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com