lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190904182607.GG17205@worktop.programming.kicks-ass.net>
Date:   Wed, 4 Sep 2019 20:26:07 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        paulmck <paulmck@...ux.ibm.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Oleg Nesterov <oleg@...hat.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        "Russell King, ARM Linux" <linux@...linux.org.uk>,
        Chris Metcalf <cmetcalf@...hip.com>,
        Chris Lameter <cl@...ux.com>, Kirill Tkhai <tkhai@...dex.ru>,
        Mike Galbraith <efault@....de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [RFC PATCH 1/2] Fix: sched/membarrier: p->mm->membarrier_state
 racy load

On Wed, Sep 04, 2019 at 01:12:53PM -0400, Mathieu Desnoyers wrote:
> ----- On Sep 4, 2019, at 12:09 PM, Peter Zijlstra peterz@...radead.org wrote:
> 
> > On Wed, Sep 04, 2019 at 11:19:00AM -0400, Mathieu Desnoyers wrote:
> >> ----- On Sep 3, 2019, at 4:36 PM, Linus Torvalds torvalds@...ux-foundation.org
> >> wrote:
> > 
> >> > I wonder if the easiest model might be to just use a percpu variable
> >> > instead for the membarrier stuff? It's not like it has to be in
> >> > 'struct task_struct' at all, I think. We only care about the current
> >> > runqueues, and those are percpu anyway.
> >> 
> >> One issue here is that membarrier iterates over all runqueues without
> >> grabbing any runqueue lock. If we copy that state from mm to rq on
> >> sched switch prepare, we would need to ensure we have the proper
> >> memory barriers between:
> >> 
> >> prior user-space memory accesses  /  setting the runqueue membarrier state
> >> 
> >> and
> >> 
> >> setting the runqueue membarrier state / following user-space memory accesses
> >> 
> >> Copying the membarrier state into the task struct leverages the fact that
> >> we have documented and guaranteed those barriers around the rq->curr update
> >> in the scheduler.
> > 
> > Should be the same as the barriers we already rely on for rq->curr, no?
> > That is, if we put this before switch_mm() then we have
> > smp_mb__after_spinlock() and switch_mm() itself.
> 
> Yes, I think we can piggy-back on the already documented barriers documented around
> rq->curr store.
> 
> > Also, if we place mm->membarrier_state in the same cacheline as mm->pgd
> > (which switch_mm() is bound to load) then we should be fine, I think.
> 
> Yes, if we make sure membarrier_prepare_task_switch only updates the
> rq->membarrier_state if prev->mm != next->mm, we should be able to avoid
> loading next->mm->membarrier_state when switch_mm() is not invoked.
> 
> I'll prepare RFC patch implementing this approach.

Thinking about this a bit; switching it 'on' still requires some
thinking. Consider register on an already threaded process of which
multiple threads are 'current' on multiple CPUs. In that case none of
the rq bits will be set.

Not even synchronize_rcu() is sufficient to force it either, since we
only update on switch_mm() and nothing guarantees we pass that.

One possible approach would be to IPI broadcast (after setting the
->mm->membarrier_State) and having the IPI update the rq state from
'current->mm'.

But possible I'm just confusing evryone again. I'm not having a good day
today.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ