[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D0CD8C7.8070604@kernel.org>
Date: Sat, 18 Dec 2010 16:52:39 +0100
From: Tejun Heo <tj@...nel.org>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
CC: linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
dipankar@...ibm.com, akpm@...ux-foundation.org,
mathieu.desnoyers@...ymtl.ca, josh@...htriplett.org,
niv@...ibm.com, tglx@...utronix.de, peterz@...radead.org,
rostedt@...dmis.org, Valdis.Kletnieks@...edu, dhowells@...hat.com,
eric.dumazet@...il.com, darren@...art.com
Subject: Re: [PATCH RFC tip/core/rcu 11/20] rcu: fix race condition in synchronize_sched_expedited()
Hello,
On 12/17/2010 09:54 PM, Paul E. McKenney wrote:
> The new (early 2010) implementation of synchronize_sched_expedited() uses
> try_stop_cpu() to force a context switch on every CPU. It also permits
> concurrent calls to synchronize_sched_expedited() to share a single call
> to try_stop_cpu() through use of an atomically incremented
> synchronize_sched_expedited_count variable. Unfortunately, this is
> subject to failure as follows:
>
> o Task A invokes synchronize_sched_expedited(), try_stop_cpus()
> succeeds, but Task A is preempted before getting to the atomic
> increment of synchronize_sched_expedited_count.
>
> o Task B also invokes synchronize_sched_expedited(), with exactly
> the same outcome as Task A.
>
> o Task C also invokes synchronize_sched_expedited(), again with
> exactly the same outcome as Tasks A and B.
>
> o Task D also invokes synchronize_sched_expedited(), but only
> gets as far as acquiring the mutex within try_stop_cpus()
> before being preempted, interrupted, or otherwise delayed.
>
> o Task E also invokes synchronize_sched_expedited(), but only
> gets to the snapshotting of synchronize_sched_expedited_count.
>
> o Tasks A, B, and C all increment synchronize_sched_expedited_count.
>
> o Task E fails to get the mutex, so checks the new value
> of synchronize_sched_expedited_count. It finds that the
> value has increased, so (wrongly) assumes that its work
> has been done, returning despite there having been no
> expedited grace period since it began.
>
> The solution is to have the lowest-numbered CPU atomically increment
> the synchronize_sched_expedited_count variable within the
> synchronize_sched_expedited_cpu_stop() function, which is under
> the protection of the mutex acquired by try_stop_cpus(). However, this
> also requires that piggybacking tasks wait for three rather than two
> instances of try_stop_cpu(), because we cannot control the order in
> which the per-CPU callback function occur.
>
> Cc: Tejun Heo <tj@...nel.org>
> Cc: Lai Jiangshan <laijs@...fujitsu.com>
> Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@...nel.org>
I suppose this should go -stable?
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists