netdev - Re: [PATCH 6/13] bridge: Add core IGMP snooping support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100306050656.GA6812@linux.vnet.ibm.com>
Date:	Fri, 5 Mar 2010 21:06:56 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Herbert Xu <herbert@...dor.apana.org.au>
Cc:	"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
	Stephen Hemminger <shemminger@...tta.com>
Subject: Re: [PATCH 6/13] bridge: Add core IGMP snooping support

On Sat, Mar 06, 2010 at 09:17:18AM +0800, Herbert Xu wrote:
> On Fri, Mar 05, 2010 at 03:43:27PM -0800, Paul E. McKenney wrote:
> > 
> > Cool!!!  You use a pair of list_head structures, so that a given
> > element can be in both the old and the new hash table simultaneously.
> > Of course, an RCU grace period must elapse between consecutive resizings.
> > Which appears to be addressed.
> 
> Thanks :) I will try to extend this to other existing hash tables
> where the number of updates can be limited like it is here.
> 
> > The teardown needs an rcu_barrier_bh() rather than the current
> > synchronize_rcu_bh(), please see below.
> 
> All the call_rcu_bh's are done under multicast_lock.  The first
> check taken after taking the multicast_lock is whether we've
> started the tear-down.  So where it currently calls synchronize()
> it should already be the case that no call_rcu_bh's are still
> running.

Agreed, but the callbacks registered by the call_rcu_bh() might run
at any time, possibly quite some time after the synchronize_rcu_bh()
completes.  For example, the last call_rcu_bh() might register on
one CPU, and the synchronize_rcu_bh() on another CPU.  Then there
is no guarantee that the call_rcu_bh()'s callback will execute before
the synchronize_rcu_bh() returns.

In contrast, rcu_barrier_bh() is guaranteed not to return until all
pending RCU-bh callbacks have executed.

> > Also, I don't see how the teardown code is preventing new readers from
> > finding the data structures before they are being passed to call_rcu_bh().
> > You can't safely start the RCU grace period until -after- all new readers
> > have been excluded.  (But I could easily be missing something here.)
> 
> I understand.  However, AFAICS whatever it is that we are destroying
> is taken off the reader's visible data structure before call_rcu_bh.
> Do you have a particular case in mind where this is not the case?

I might simply have missed the operation that removed reader
visibility, looking again...

Ah, I see it.  The "br->mdb = NULL" in br_multicast_stop() makes
it impossible for the readers to get to any of the data.  Right?

If so, my confusion, you are right, this one is OK.

> > The br_multicast_del_pg() looks to need rcu_read_lock_bh() and
> > rcu_read_unlock_bh() around its loop, if I understand the pointer-walking
> > scheme correctly.
> 
> Any function that modifies the data structure is done under the
> multicast_lock, including br_multicast_del_pg.

But spin_lock() does not take the place of rcu_read_lock_bh().
And so, in theory, the RCU-bh grace period could complete between
the time that br_multicast_del_pg() does its call_rcu_bh() and the
"*pp = p->next;" at the top of the next loop iteration.  If so,
then br_multicast_free_pg()'s kfree() will possibly have clobbered
"p->next".  Low probability, yes, but a long-running interrupt
could do the trick.

Or is there something I am missing that is preventing an RCU-bh
grace period from completing near the bottom of br_multicast_del_pg()'s
"for" loop?

> > Hmmm...  Where is the read-side code?  Wherever it is, it cannot safely
> > dereference the ->old pointer.
> 
> Right, the old pointer is merely there to limit rehashings to one
> per window.  So it isn't used by the read-side.

Good!

> The read-side is the data path (non-IGMP multicast packets).  The
> sole entry point is br_mdb_get().

Hmmm...  So the caller is responsible for rcu_read_lock_bh()?

Shouldn't the br_mdb_get() code path be using hlist_for_each_entry_rcu()
in __br_mdb_ip_get(), then?  Or is something else going on here?

							Thanx, Paul

> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@...dor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html