linux-kernel - Re: call_function_many: fix list delete vs add race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1296506396.26581.76.camel@laptop>
Date:	Mon, 31 Jan 2011 21:39:56 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Milton Miller <miltonm@....com>
Cc:	akpm@...ux-foundation.org, Anton Blanchard <anton@...ba.org>,
	xiaoguangrong@...fujitsu.com, mingo@...e.hu, jaxboe@...ionio.com,
	npiggin@...il.com, rusty@...tcorp.com.au,
	torvalds@...ux-foundation.org, paulmck@...ux.vnet.ibm.com,
	benh@...nel.crashing.org, linux-kernel@...r.kernel.org
Subject: Re: call_function_many: fix list delete vs add race

On Mon, 2011-01-31 at 14:26 -0600, Milton Miller wrote:
> On Mon, 31 Jan 2011 about 11:27:45 +0100, Peter Zijlstra wrote: 
> > On Fri, 2011-01-28 at 18:20 -0600, Milton Miller wrote:
> > > Peter pointed out there was nothing preventing the list_del_rcu in
> > > smp_call_function_interrupt from running before the list_add_rcu in
> > > smp_call_function_many.   Fix this by not setting refs until we have put
> > > the entry on the list.  We can use the lock acquire and release instead
> > > of a wmb.
> > > 
> > > Reported-by: Peter Zijlstra <peterz@...radead.org>
> > > Signed-off-by: Milton Miller <miltonm@....com>
> > > Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
> > > ---
> > > 
> > > I tried to force this race with a udelay before the lock & list_add and
> > > by mixing all 64 online cpus with just 3 random cpus in the mask, but
> > > was unsuccessful.  Still, it seems to be a valid race, and the fix
> > > is a simple change to the current code.
> > 
> > Yes, I think this will fix it, I think simply putting that assignment
> > under the lock is sufficient, because then the list removal will
> > serialize again the list add. But placing it after the list add does
> > also seem sufficient.
> > 
> > Acked-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> > 
> 
> I was worried some architectures would allow a write before the spinlock
> to drop into the spinlock region,

That is indeed allowed to happen

>  in which case the data or function
> pointer could be found stale with the cpu mask bit set. 

But that is ok, right? the atomic_read(->refs) test will fail and we'll
continue.

>  The unlock
> must flush all prior writes and 

and reads

> therefore the new function and data
> will be seen before refs is set.


Which again should be just fine, given the interrupt does:

if (!cpumask_test_cpu())
	continue

rmb

if (!atomic_read())
	continue

and thus we'll be on our happy merry way. If we do however observe the
new ->refs value we have already acquired the lock on the sending end
and the spinlock before the list_del_rcu() will serialize against it
such that we'll always finish the list_add_rcu() before executing the
del.

Or am I not quite understanding things?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/