netdev - Re: Kernel panic nf_nat_setup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1301635840.2881.19.camel@edumazet-laptop>
Date:	Fri, 01 Apr 2011 07:30:40 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Changli Gao <xiaosuo@...il.com>
Cc:	"\"Oleg A. Arkhangelsky\"" <sysoleg@...dex.ru>,
	Patrick McHardy <kaber@...sh.net>,
	netfilter-devel@...r.kernel.org, netdev@...r.kernel.org,
	Paul E McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: Kernel panic nf_nat_setup_info+0x5b3/0x6e0

Le vendredi 01 avril 2011 à 10:02 +0800, Changli Gao a écrit :
> On Thu, Mar 31, 2011 at 10:47 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> >
> > I wonder if this is not hiding another bug.
> >
> > Adding an RCU grace period might reduce the probability window.
> 
> What bug? This one?
> 

Yes. I am saying your patch is a brown paper bag.

Real bug is elsewhere, and we must find it, not make it happen less
frequently.

Maybe its a missing barrier, and adding a full RCU grace period is a
waste (of cpu caches space, since call_rcu() fill a potential big list)

> >
> > By the time nf_conntrack_free(ct) is called, no other cpu/thread
> > could/should use ct, or ct->ext ?
> 
> nat->ct may refer it.
> 

conntrack must have a real problem of refcounting then.

Each time an object A has a reference to object B, we should increase B
refcount, so B cannot disappear.

nat->ct _cannot_ refer ct if we are freeing ct. Thats quite simple rule.

> >
> > Sure, another thread can find/pass_on ct in a lookup but should not use
> > it, since its refcount (ct_general.use) should be 0.
> >
> >
> 
> As SLAB_DESTROY_BY_RCU is used, we should validate ct->orig_tuple too.
> 
> There is another minor problem.
> 

minor or serious ?


> nf_nat_core.c:
> 
> 133         rcu_read_lock();
> 134         hlist_for_each_entry_rcu(nat, n,
> &net->ipv4.nat_bysource[h], bysourc    e) {
> 135                 ct = nat->ct;
> 136                 if (same_src(ct, tuple) && nf_ct_zone(ct) == zone) {
> 137                         /* Copy source part from reply tuple. */
> 138                         nf_ct_invert_tuplepr(result,
> 139
> &ct->tuplehash[IP_CT_DIR_REPLY].tuple    );
> 140                         result->dst = tuple->dst;
> 141
> 142                         if (in_range(result, range)) {
> 143                                 rcu_read_unlock();
> 144                                 return 1;
> 145                         }
> 146                 }
> 147         }
> 148         rcu_read_unlock();
> 
> If the ct is reused, NAT mapping will be wrong. It isn't a serious
> problem, and can't be fixed, even though we check the reference
> counter before using it, but we can't validate it with the original
> tuple.
> 

I call this a serious problem. netfilter is not a fuzzy logic.

> Maybe we can do it in this way
> 
>                 ct = nat->ct;
>                 if (!nf_ct_is_dying(ct) &&
> atomic_inc_not_zero(&ct->ct_general.use))) {
>                                if (nf_ct_ext_find(ct, NF_CT_EXT_NAT) == nat) {
>                                         /* ct is valid, do sth... */
>                                }
>                                nf_ct_put(ct);
>                 }
> 
> I think two additional atomic operations are expensive. It isn't a good idea.
> 

It depends. This is better than taking the conntrack lock.

SLAB_DESTROY_BY_RCU is not allowing for fuzzy logic. Rules are we _must_
take a reference on object after finding it, and _recheck_ validity of
the object before using it.

Another way to avoid atomic operations in find_appropriate_src() is to
use a seqcount (or seqlock), and change the seqcount every time
something is changed in ct.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html