linux-kernel - Re: [PATCH RFC tip/core/rcu 11/20] rcu: fix race condition in synchronize_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101218195849.GC2143@linux.vnet.ibm.com>
Date:	Sat, 18 Dec 2010 11:58:49 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
	dipankar@...ibm.com, akpm@...ux-foundation.org,
	mathieu.desnoyers@...ymtl.ca, josh@...htriplett.org,
	niv@...ibm.com, tglx@...utronix.de, peterz@...radead.org,
	rostedt@...dmis.org, Valdis.Kletnieks@...edu, dhowells@...hat.com,
	eric.dumazet@...il.com, darren@...art.com
Subject: Re: [PATCH RFC tip/core/rcu 11/20] rcu: fix race condition in
 synchronize_sched_expedited()

On Sat, Dec 18, 2010 at 04:52:39PM +0100, Tejun Heo wrote:
> Hello,
> 
> On 12/17/2010 09:54 PM, Paul E. McKenney wrote:
> > The new (early 2010) implementation of synchronize_sched_expedited() uses
> > try_stop_cpu() to force a context switch on every CPU.  It also permits
> > concurrent calls to synchronize_sched_expedited() to share a single call
> > to try_stop_cpu() through use of an atomically incremented
> > synchronize_sched_expedited_count variable.  Unfortunately, this is
> > subject to failure as follows:
> > 
> > o	Task A invokes synchronize_sched_expedited(), try_stop_cpus()
> > 	succeeds, but Task A is preempted before getting to the atomic
> > 	increment of synchronize_sched_expedited_count.
> > 
> > o	Task B also invokes synchronize_sched_expedited(), with exactly
> > 	the same outcome as Task A.
> > 
> > o	Task C also invokes synchronize_sched_expedited(), again with
> > 	exactly the same outcome as Tasks A and B.
> > 
> > o	Task D also invokes synchronize_sched_expedited(), but only
> > 	gets as far as acquiring the mutex within try_stop_cpus()
> > 	before being preempted, interrupted, or otherwise delayed.
> > 
> > o	Task E also invokes synchronize_sched_expedited(), but only
> > 	gets to the snapshotting of synchronize_sched_expedited_count.
> > 
> > o	Tasks A, B, and C all increment synchronize_sched_expedited_count.
> > 
> > o	Task E fails to get the mutex, so checks the new value
> > 	of synchronize_sched_expedited_count.  It finds that the
> > 	value has increased, so (wrongly) assumes that its work
> > 	has been done, returning despite there having been no
> > 	expedited grace period since it began.
> > 
> > The solution is to have the lowest-numbered CPU atomically increment
> > the synchronize_sched_expedited_count variable within the
> > synchronize_sched_expedited_cpu_stop() function, which is under
> > the protection of the mutex acquired by try_stop_cpus().  However, this
> > also requires that piggybacking tasks wait for three rather than two
> > instances of try_stop_cpu(), because we cannot control the order in
> > which the per-CPU callback function occur.
> > 
> > Cc: Tejun Heo <tj@...nel.org>
> > Cc: Lai Jiangshan <laijs@...fujitsu.com>
> > Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> 
> Acked-by: Tejun Heo <tj@...nel.org>

Thank you!

> I suppose this should go -stable?

Given that it is only a theoretical bug, I am targeting 2.6.38 rather
than 2.6.37.  But yes, looks to me like a -stable candidate.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/