[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1879888051.17397.1506262984228.JavaMail.zimbra@efficios.com>
Date: Sun, 24 Sep 2017 14:23:04 +0000 (UTC)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Boqun Feng <boqun.feng@...il.com>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Andrew Hunter <ahh@...gle.com>,
maged michael <maged.michael@...il.com>,
gromer <gromer@...gle.com>, Avi Kivity <avi@...lladb.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Michael Ellerman <mpe@...erman.id.au>,
Dave Watson <davejwatson@...com>,
Alan Stern <stern@...land.harvard.edu>,
Will Deacon <will.deacon@....com>,
Andy Lutomirski <luto@...nel.org>,
linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited
private command
----- On Sep 24, 2017, at 9:30 AM, Boqun Feng boqun.feng@...il.com wrote:
> On Fri, Sep 22, 2017 at 03:10:10PM +0000, Mathieu Desnoyers wrote:
>> ----- On Sep 22, 2017, at 4:59 AM, Boqun Feng boqun.feng@...il.com wrote:
>>
>> > On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote:
>> > [...]
>> >> +static inline void membarrier_arch_sched_in(struct task_struct *prev,
>> >> + struct task_struct *next)
>> >> +{
>> >> + /*
>> >> + * Only need the full barrier when switching between processes.
>> >> + */
>> >> + if (likely(!test_ti_thread_flag(task_thread_info(next),
>> >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED)
>> >> + || prev->mm == next->mm))
>> >
>> > And we also don't need the smp_mb() if !prev->mm, because switching from
>> > kernel to user will have a smp_mb() implied by mmdrop()?
>>
>> Right. And we also don't need it when switching from userspace to kernel
>
> Yep, but this case is covered already, as I think we don't allow kernel
> thread to have TIF_MEMBARRIER_PRIVATE_EXPEDITED set, right?
Good point.
>
>> thread neither. Something like this:
>>
>> static inline void membarrier_arch_sched_in(struct task_struct *prev,
>> struct task_struct *next)
>> {
>> /*
>> * Only need the full barrier when switching between processes.
>> * Barrier when switching from kernel to userspace is not
>> * required here, given that it is implied by mmdrop(). Barrier
>> * when switching from userspace to kernel is not needed after
>> * store to rq->curr.
>> */
>> if (likely(!test_ti_thread_flag(task_thread_info(next),
>> TIF_MEMBARRIER_PRIVATE_EXPEDITED)
>> || !prev->mm || !next->mm || prev->mm == next->mm))
>
> , so no need to test next->mm here.
>
Right, it's redundant wrt testing the thread flag.
>> return;
>>
>> /*
>> * The membarrier system call requires a full memory barrier
>> * after storing to rq->curr, before going back to user-space.
>> */
>> smp_mb();
>> }
>>
>> >
>> >> + return;
>> >> +
>> >> + /*
>> >> + * The membarrier system call requires a full memory barrier
>> >> + * after storing to rq->curr, before going back to user-space.
>> >> + */
>> >> + smp_mb();
>> >> +}
>> >
>> > [...]
>> >
>> >> +static inline void membarrier_fork(struct task_struct *t,
>> >> + unsigned long clone_flags)
>> >> +{
>> >> + if (!current->mm || !t->mm)
>> >> + return;
>> >> + t->mm->membarrier_private_expedited =
>> >> + current->mm->membarrier_private_expedited;
>> >
>> > Have we already done the copy of ->membarrier_private_expedited in
>> > copy_mm()?
>>
>> copy_mm() is performed without holding current->sighand->siglock, so
>> it appears to be racing with concurrent membarrier register cmd.
>
> Speak of racing, I think we currently have a problem if we do a
> register_private_expedited in one thread and do a
> membarrer_private_expedited in another thread(sharing the same mm), as
> follow:
>
> {t1,t2,t3 sharing the same ->mm}
> CPU 0 CPU 1 CPU2
> ==================== =================== ============
> {in thread t1}
> membarrier_register_private_expedited():
> ...
> WRITE_ONCE(->mm->membarrier_private_expedited, 1);
> membarrier_arch_register_private_expedited():
> ...
> <haven't set the TIF for t3 yet>
>
> {in thread t2}
> membarrier_private_expedited():
> READ_ONCE(->mm->membarrier_private_expedited); // == 1
> ...
> for_each_online_cpu()
> ...
> <p is cpu_rq(CPU2)->curr>
> if (p && p->mm == current->mm) // false
> <so no ipi sent to CPU2>
>
> {about to switch to t3}
> rq->curr = t3;
> ....
> context_switch():
> ...
> finish_task_swtich():
> membarrier_sched_in():
> <TIF is not set>
> // no smp_mb() here.
>
> , and we will miss the smp_mb() on CPU2, right? And this could even
> happen if t2 has a membarrier_register_private_expedited() preceding the
> membarrier_private_expedited().
>
> Am I missing something subtle here?
I think the problem sits in this function:
static void membarrier_register_private_expedited(void)
{
struct task_struct *p = current;
if (READ_ONCE(p->mm->membarrier_private_expedited))
return;
WRITE_ONCE(p->mm->membarrier_private_expedited, 1);
membarrier_arch_register_private_expedited(p);
}
I need to change the order between WRITE_ONCE() and invoking
membarrier_arch_register_private_expedited. If I issue the
WRITE_ONCE after the arch code (which sets the TIF flags),
then concurrent membarrier priv exped commands will simply
return an -EPERM error:
static void membarrier_register_private_expedited(void)
{
struct task_struct *p = current;
if (READ_ONCE(p->mm->membarrier_private_expedited))
return;
membarrier_arch_register_private_expedited(p);
WRITE_ONCE(p->mm->membarrier_private_expedited, 1);
}
Do you agree that this would fix the race you identified ?
Thanks,
Mathieu
>
> Regards,
> Boqun
>
>
>> However, given that it is a single flag updated with WRITE_ONCE()
>> and read with READ_ONCE(), it might be OK to rely on copy_mm there.
>> If userspace runs registration concurrently with fork, they should
>> not expect the child to be specifically registered or unregistered.
>>
>> So yes, I think you are right about removing this copy and relying on
>> copy_mm() instead. I also think we can improve membarrier_arch_fork()
>> on powerpc to test the current thread flag rather than using current->mm.
>>
>> Which leads to those two changes:
>>
>> static inline void membarrier_fork(struct task_struct *t,
>> unsigned long clone_flags)
>> {
>> /*
>> * Prior copy_mm() copies the membarrier_private_expedited field
>> * from current->mm to t->mm.
>> */
>> membarrier_arch_fork(t, clone_flags);
>> }
>>
>> And on PowerPC:
>>
>> static inline void membarrier_arch_fork(struct task_struct *t,
>> unsigned long clone_flags)
>> {
>> /*
>> * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> * fork is protected by siglock. membarrier_arch_fork is called
>> * with siglock held.
>> */
>> if (test_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED))
>> set_ti_thread_flag(task_thread_info(t),
>> TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> }
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>> >
>> > Regards,
>> > Boqun
>> >
>> >> + membarrier_arch_fork(t, clone_flags);
>> >> +}
>> >> +static inline void membarrier_execve(struct task_struct *t)
>> >> +{
>> >> + t->mm->membarrier_private_expedited = 0;
>> >> + membarrier_arch_execve(t);
>> >> +}
>> >> +#else
>> >> +static inline void membarrier_sched_in(struct task_struct *prev,
>> >> + struct task_struct *next)
>> >> +{
>> >> +}
>> >> +static inline void membarrier_fork(struct task_struct *t,
>> >> + unsigned long clone_flags)
>> >> +{
>> >> +}
>> >> +static inline void membarrier_execve(struct task_struct *t)
>> >> +{
>> >> +}
>> >> +#endif
>> >> +
>> > [...]
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists