[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1263977151.4283.816.camel@laptop>
Date: Wed, 20 Jan 2010 09:45:51 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc: Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Oleg Nesterov <oleg@...hat.com>, Ingo Molnar <mingo@...e.hu>,
akpm@...ux-foundation.org, josh@...htriplett.org,
tglx@...utronix.de, Valdis.Kletnieks@...edu, dhowells@...hat.com,
laijs@...fujitsu.com, dipankar@...ibm.com,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [RFC PATCH] introduce sys_membarrier(): process-wide memory
barrier (v5)
On Tue, 2010-01-19 at 22:13 -0500, Mathieu Desnoyers wrote:
> * Peter Zijlstra (peterz@...radead.org) wrote:
> > On Tue, 2010-01-19 at 19:37 +0100, Peter Zijlstra wrote:
> > > On Thu, 2010-01-14 at 14:33 -0500, Mathieu Desnoyers wrote:
> > > > It's a case where CPU 1 switches from our mm to another mm:
> > > >
> > > > CPU 0 (membarrier) CPU 1 (another mm -our mm)
> > > > <user-space> <user-space>
> > > > <buffered access C.S. data>
> > > > urcu read unlock()
> > > > barrier()
> > > > store local gp
> > > > <kernel-space>
> > >
> > > OK, so the question is how we end up here, if its though interrupt
> > > preemption I think the interrupt delivery will imply an mb,
> >
> > I keep thinking that, but I think we actually refuted that in an earlier
> > discussion on this patch.
>
> Intel Architecture Software Developer's Manual Vol. 3: System
> Programming
> 7.4 Serializing Instructions
>
> "MOV to control reg, MOV to debug reg, WRMSR, INVD, INVLPG, WBINDV, LGDT,
> LLDT, LIDT, LTR, CPUID, IRET, RSM"
>
> So, this list does _not_ include: INT, SYSENTER, SYSEXIT.
>
> Only IRET is included. So I don't think it is safe to assume that x86
> has serializing instructions when entering/leaving the kernel.
I got confused by 7.1.2.1 automatic locking on interrupt acknowledge.
But I already retracted that stmt.
> >
> > > if its a
> > > blocking syscall, the set_task_state() mb [*] should be there.
> > >
> > > Then we also do:
> > >
> > > clear_tsk_need_resched()
> > >
> > > which is an atomic bitop (although does not imply a full barrier
> > > per-se).
> > >
> > > > rq->curr = next (1)
> >
> > We could possibly look at placing that assignment in context_switch()
> > between switch_mm() and switch_to(), which should provide a mb before
> > and after I think, Ingo?
>
> That's an interesting idea. It would indeed fix the problem of the
> missing barrier before the assignment, but would lack the appropriate
> barrier after the assignment. If the rq->curr = next; assignment is made
> after load_cr3, then we lack a memory barrier between the assignment and
> execution of following user-space code after returning with SYSEXIT (and
> we lack the appropriate barrier for other architectures too).
Well, 7.1.2.1 says that writing a segment register implies a LOCK, but
on second reading there are a number of qualifiers there, not sure we
satisfy that.
Peter, does our switch_to() imply a mb?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists