[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1263926259.4283.757.camel@laptop>
Date: Tue, 19 Jan 2010 19:37:39 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc: Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Oleg Nesterov <oleg@...hat.com>, Ingo Molnar <mingo@...e.hu>,
akpm@...ux-foundation.org, josh@...htriplett.org,
tglx@...utronix.de, Valdis.Kletnieks@...edu, dhowells@...hat.com,
laijs@...fujitsu.com, dipankar@...ibm.com
Subject: Re: [RFC PATCH] introduce sys_membarrier(): process-wide memory
barrier (v5)
On Thu, 2010-01-14 at 14:33 -0500, Mathieu Desnoyers wrote:
> It's a case where CPU 1 switches from our mm to another mm:
>
> CPU 0 (membarrier) CPU 1 (another mm -our mm)
> <user-space> <user-space>
> <buffered access C.S. data>
> urcu read unlock()
> barrier()
> store local gp
> <kernel-space>
OK, so the question is how we end up here, if its though interrupt
preemption I think the interrupt delivery will imply an mb, if its a
blocking syscall, the set_task_state() mb [*] should be there.
Then we also do:
clear_tsk_need_resched()
which is an atomic bitop (although does not imply a full barrier
per-se).
> rq->curr = next (1)
> memory access before membarrier
> <call sys_membarrier()>
> smp_mb()
> mm_cpumask includes CPU 1
> rcu_read_lock()
> if (cpu_curr(1)->mm != our mm)
> skip CPU 1 -> here, rq->curr new version is already visible
> rcu_read_unlock()
> smp_mb()
> <return to user-space>
> memory access after membarrier
> -> this is where we allow freeing
> the old structure although the
> buffered access C.S. data is
> still in flight.
> User-space access C.S. data (2)
> (buffer flush)
> switch_mm()
> smp_mb()
> clear_mm_cpumask()
> set_mm_cpumask()
> smp_mb() (by load_cr3() on x86)
> switch_to()
> <buffered current = next>
> <switch back to user-space>
> current = next (1) (buffer flush)
> access critical section data (3)
>
> As we can see, the reordering of (1) and (2) is problematic, as it lets
> the check skip over a CPU that have global side-effects not committed to
> memory yet.
Right, this one I get, thanks!
So about that [*], Oleg, kernel/signal.c:SYSCALL_DEFINE0(pause) does:
SYSCALL_DEFINE0(pause)
{
current->state = TASK_INTERRUPTIBLE;
schedule();
return -ERESTARTNOHAND;
}
Isn't that ->state assignment buggy? If so, there appear to be quite a
few such sites, which worries me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists