[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1713923102.3325.1567731091553.JavaMail.zimbra@efficios.com>
Date: Thu, 5 Sep 2019 20:51:31 -0400 (EDT)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
paulmck <paulmck@...ux.ibm.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Oleg Nesterov <oleg@...hat.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
"Russell King, ARM Linux" <linux@...linux.org.uk>,
Chris Metcalf <cmetcalf@...hip.com>,
Chris Lameter <cl@...ux.com>, Kirill Tkhai <tkhai@...dex.ru>,
Mike Galbraith <efault@....de>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: [RFC PATCH 1/2] Fix: sched/membarrier: p->mm->membarrier_state
racy load
----- On Sep 4, 2019, at 2:26 PM, Peter Zijlstra peterz@...radead.org wrote:
> On Wed, Sep 04, 2019 at 01:12:53PM -0400, Mathieu Desnoyers wrote:
>> ----- On Sep 4, 2019, at 12:09 PM, Peter Zijlstra peterz@...radead.org wrote:
>>
>> > On Wed, Sep 04, 2019 at 11:19:00AM -0400, Mathieu Desnoyers wrote:
>> >> ----- On Sep 3, 2019, at 4:36 PM, Linus Torvalds torvalds@...ux-foundation.org
>> >> wrote:
>> >
>> >> > I wonder if the easiest model might be to just use a percpu variable
>> >> > instead for the membarrier stuff? It's not like it has to be in
>> >> > 'struct task_struct' at all, I think. We only care about the current
>> >> > runqueues, and those are percpu anyway.
>> >>
>> >> One issue here is that membarrier iterates over all runqueues without
>> >> grabbing any runqueue lock. If we copy that state from mm to rq on
>> >> sched switch prepare, we would need to ensure we have the proper
>> >> memory barriers between:
>> >>
>> >> prior user-space memory accesses / setting the runqueue membarrier state
>> >>
>> >> and
>> >>
>> >> setting the runqueue membarrier state / following user-space memory accesses
>> >>
>> >> Copying the membarrier state into the task struct leverages the fact that
>> >> we have documented and guaranteed those barriers around the rq->curr update
>> >> in the scheduler.
>> >
>> > Should be the same as the barriers we already rely on for rq->curr, no?
>> > That is, if we put this before switch_mm() then we have
>> > smp_mb__after_spinlock() and switch_mm() itself.
>>
>> Yes, I think we can piggy-back on the already documented barriers documented
>> around
>> rq->curr store.
>>
>> > Also, if we place mm->membarrier_state in the same cacheline as mm->pgd
>> > (which switch_mm() is bound to load) then we should be fine, I think.
>>
>> Yes, if we make sure membarrier_prepare_task_switch only updates the
>> rq->membarrier_state if prev->mm != next->mm, we should be able to avoid
>> loading next->mm->membarrier_state when switch_mm() is not invoked.
>>
>> I'll prepare RFC patch implementing this approach.
>
> Thinking about this a bit; switching it 'on' still requires some
> thinking. Consider register on an already threaded process of which
> multiple threads are 'current' on multiple CPUs. In that case none of
> the rq bits will be set.
>
> Not even synchronize_rcu() is sufficient to force it either, since we
> only update on switch_mm() and nothing guarantees we pass that.
>
> One possible approach would be to IPI broadcast (after setting the
> ->mm->membarrier_State) and having the IPI update the rq state from
> 'current->mm'.
>
> But possible I'm just confusing evryone again. I'm not having a good day
> today.
Indeed, switching 'on' requires some care. I implemented the IPI-based
approach as per your suggestion,
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists