[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100120031323.GA15318@Krystal>
Date: Tue, 19 Jan 2010 22:13:23 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Oleg Nesterov <oleg@...hat.com>, Ingo Molnar <mingo@...e.hu>,
akpm@...ux-foundation.org, josh@...htriplett.org,
tglx@...utronix.de, Valdis.Kletnieks@...edu, dhowells@...hat.com,
laijs@...fujitsu.com, dipankar@...ibm.com
Subject: Re: [RFC PATCH] introduce sys_membarrier(): process-wide memory
barrier (v5)
* Peter Zijlstra (peterz@...radead.org) wrote:
> On Tue, 2010-01-19 at 19:37 +0100, Peter Zijlstra wrote:
> > On Thu, 2010-01-14 at 14:33 -0500, Mathieu Desnoyers wrote:
> > > It's a case where CPU 1 switches from our mm to another mm:
> > >
> > > CPU 0 (membarrier) CPU 1 (another mm -our mm)
> > > <user-space> <user-space>
> > > <buffered access C.S. data>
> > > urcu read unlock()
> > > barrier()
> > > store local gp
> > > <kernel-space>
> >
> > OK, so the question is how we end up here, if its though interrupt
> > preemption I think the interrupt delivery will imply an mb,
>
> I keep thinking that, but I think we actually refuted that in an earlier
> discussion on this patch.
Intel Architecture Software Developer's Manual Vol. 3: System
Programming
7.4 Serializing Instructions
"MOV to control reg, MOV to debug reg, WRMSR, INVD, INVLPG, WBINDV, LGDT,
LLDT, LIDT, LTR, CPUID, IRET, RSM"
So, this list does _not_ include: INT, SYSENTER, SYSEXIT.
Only IRET is included. So I don't think it is safe to assume that x86
has serializing instructions when entering/leaving the kernel.
>
> > if its a
> > blocking syscall, the set_task_state() mb [*] should be there.
> >
> > Then we also do:
> >
> > clear_tsk_need_resched()
> >
> > which is an atomic bitop (although does not imply a full barrier
> > per-se).
> >
> > > rq->curr = next (1)
>
> We could possibly look at placing that assignment in context_switch()
> between switch_mm() and switch_to(), which should provide a mb before
> and after I think, Ingo?
That's an interesting idea. It would indeed fix the problem of the
missing barrier before the assignment, but would lack the appropriate
barrier after the assignment. If the rq->curr = next; assignment is made
after load_cr3, then we lack a memory barrier between the assignment and
execution of following user-space code after returning with SYSEXIT (and
we lack the appropriate barrier for other architectures too).
Thanks,
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists