[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1301635840.2881.19.camel@edumazet-laptop>
Date: Fri, 01 Apr 2011 07:30:40 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Changli Gao <xiaosuo@...il.com>
Cc: "\"Oleg A. Arkhangelsky\"" <sysoleg@...dex.ru>,
Patrick McHardy <kaber@...sh.net>,
netfilter-devel@...r.kernel.org, netdev@...r.kernel.org,
Paul E McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: Kernel panic nf_nat_setup_info+0x5b3/0x6e0
Le vendredi 01 avril 2011 à 10:02 +0800, Changli Gao a écrit :
> On Thu, Mar 31, 2011 at 10:47 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> >
> > I wonder if this is not hiding another bug.
> >
> > Adding an RCU grace period might reduce the probability window.
>
> What bug? This one?
>
Yes. I am saying your patch is a brown paper bag.
Real bug is elsewhere, and we must find it, not make it happen less
frequently.
Maybe its a missing barrier, and adding a full RCU grace period is a
waste (of cpu caches space, since call_rcu() fill a potential big list)
> >
> > By the time nf_conntrack_free(ct) is called, no other cpu/thread
> > could/should use ct, or ct->ext ?
>
> nat->ct may refer it.
>
conntrack must have a real problem of refcounting then.
Each time an object A has a reference to object B, we should increase B
refcount, so B cannot disappear.
nat->ct _cannot_ refer ct if we are freeing ct. Thats quite simple rule.
> >
> > Sure, another thread can find/pass_on ct in a lookup but should not use
> > it, since its refcount (ct_general.use) should be 0.
> >
> >
>
> As SLAB_DESTROY_BY_RCU is used, we should validate ct->orig_tuple too.
>
> There is another minor problem.
>
minor or serious ?
> nf_nat_core.c:
>
> 133 rcu_read_lock();
> 134 hlist_for_each_entry_rcu(nat, n,
> &net->ipv4.nat_bysource[h], bysourc e) {
> 135 ct = nat->ct;
> 136 if (same_src(ct, tuple) && nf_ct_zone(ct) == zone) {
> 137 /* Copy source part from reply tuple. */
> 138 nf_ct_invert_tuplepr(result,
> 139
> &ct->tuplehash[IP_CT_DIR_REPLY].tuple );
> 140 result->dst = tuple->dst;
> 141
> 142 if (in_range(result, range)) {
> 143 rcu_read_unlock();
> 144 return 1;
> 145 }
> 146 }
> 147 }
> 148 rcu_read_unlock();
>
> If the ct is reused, NAT mapping will be wrong. It isn't a serious
> problem, and can't be fixed, even though we check the reference
> counter before using it, but we can't validate it with the original
> tuple.
>
I call this a serious problem. netfilter is not a fuzzy logic.
> Maybe we can do it in this way
>
> ct = nat->ct;
> if (!nf_ct_is_dying(ct) &&
> atomic_inc_not_zero(&ct->ct_general.use))) {
> if (nf_ct_ext_find(ct, NF_CT_EXT_NAT) == nat) {
> /* ct is valid, do sth... */
> }
> nf_ct_put(ct);
> }
>
> I think two additional atomic operations are expensive. It isn't a good idea.
>
It depends. This is better than taking the conntrack lock.
SLAB_DESTROY_BY_RCU is not allowing for fuzzy logic. Rules are we _must_
take a reference on object after finding it, and _recheck_ validity of
the object before using it.
Another way to avoid atomic operations in find_appropriate_src() is to
use a seqcount (or seqlock), and change the seqcount every time
something is changed in ct.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists