netdev - Re: [RFC] netfilter: conntrack race between dump

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1290669214.2798.109.camel@edumazet-laptop>
Date:	Thu, 25 Nov 2010 08:13:34 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Stephen Hemminger <shemminger@...tta.com>
Cc:	Patrick McHardy <kaber@...sh.net>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	netdev@...r.kernel.org, netfilter-devel@...r.kernel.org
Subject: Re: [RFC] netfilter: conntrack race between dump_table and destroy

Le mercredi 24 novembre 2010 à 23:00 -0800, Stephen Hemminger a écrit :
> On Thu, 25 Nov 2010 07:34:33 +0100
> Eric Dumazet <eric.dumazet@...il.com> wrote:
> 
> > Le mercredi 24 novembre 2010 à 22:27 -0800, Stephen Hemminger a écrit :
> > > A customer reported a crash and the backtrace showed that
> > > ctnetlink_dump_table was running while a conntrack entry was
> > > being destroyed.  It looks like the code for walking the table
> > > with hlist_nulls_for_each_entry_rcu is not correctly handling the
> > > case where it finds a deleted entry.
> > > 
> > > According to RCU documentation, when using hlist_nulls the reader
> > > must handle the case of seeing a deleted entry and not proceed
> > > further down the linked list.  For lookup the correct behavior would
> > > be to restart the scan, but that would generate duplicate entries.
> > > 
> > > This patch is the simplest one of three alternatives:
> > >   1) if dead entry detected, skip the rest of the hash chain (see below)
> > >   2) remember skb location at start of hash chain and rescan that chain
> > >   3) switch to using a full lock when scanning rather than RCU.
> > > It all depends on the amount of effort versus consistency of results.
> > > 
> > > Signed-off-by: Stephen Hemminger <shemminger@...tta.com>
> > > 
> > > 
> > > --- a/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:11:27.661682148 -0800
> > > +++ b/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:22:28.431980247 -0800
> > > @@ -651,8 +651,12 @@ restart:
> > >  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> > >  				continue;
> > >  			ct = nf_ct_tuplehash_to_ctrack(h);
> > > +
> > > +			/* if entry is being deleted then can not proceed
> > > +			 * past this point. */
> > >  			if (!atomic_inc_not_zero(&ct->ct_general.use))
> > > -				continue;
> > > +				break;
> > > +
> > >  			/* Dump entries of a given L3 protocol number.
> > >  			 * If it is not specified, ie. l3proto == 0,
> > >  			 * then dump everything. */
> > > --
> > 
> > Hmm...
> > 
> > How restarting the loop can be a problem ? 
> 
> At this point in the loop, some entries have been placed in the netlink
> dump buffer. Restarting the loop will cause duplicate entries.
> 

Then this is another problem.

We cannot use RCU at all here.

Or we miss valid entries in the chain.

> > There must be a bug somewhere else that your patch try to avoid, not to
> > really fix.
> > 
> > Normally, destroyed ct is removed eventually from the chain, so this
> > lookup should stop.
> 
> Because hlist_nulls it is possible to walk into a dead entry, in that
> case the next pointer is no longer valid.
> 

RCU should be used where needed, in fast path only, to find one entry,
not to find _all_ entries.

For example, we cannot use it for UDP multicast delivery for the same
reasons : If we find a deleted or moved socket, we must restart the loop
and forget all accumulated sockets.

If netlink dumps each entry in the final destination container, then we
cannot restart loop, and cannot use RCU for chain lookup.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html