lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1389188841.26646.87.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Wed, 08 Jan 2014 05:47:21 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Andrey Vagin <avagin@...nvz.org>
Cc:	netfilter-devel@...r.kernel.org, netfilter@...r.kernel.org,
	coreteam@...filter.org, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, vvs@...nvz.org,
	Florian Westphal <fw@...len.de>,
	Pablo Neira Ayuso <pablo@...filter.org>,
	Patrick McHardy <kaber@...sh.net>,
	Jozsef Kadlecsik <kadlec@...ckhole.kfki.hu>,
	"David S. Miller" <davem@...emloft.net>,
	Cyrill Gorcunov <gorcunov@...nvz.org>
Subject: Re: [PATCH] netfilter: nf_conntrack: fix RCU race in
 nf_conntrack_find_get (v2)

On Wed, 2014-01-08 at 17:17 +0400, Andrey Vagin wrote:
> Lets look at destroy_conntrack:
> 
> hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
> ...
> nf_conntrack_free(ct)
> 	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
> 
> net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
> 
> The hash is protected by rcu, so readers look up conntracks without
> locks.
> A conntrack is removed from the hash, but in this moment a few readers
> still can use the conntrack. Then this conntrack is released and another
> thread creates conntrack with the same address and the equal tuple.
> After this a reader starts to validate the conntrack:
> * It's not dying, because a new conntrack was created
> * nf_ct_tuple_equal() returns true.
> 
> But this conntrack is not initialized yet, so it can not be used by two
> threads concurrently. In this case BUG_ON may be triggered from
> nf_nat_setup_info().
> 
> Florian Westphal suggested to check the confirm bit too. I think it's
> right.
> 
> task 1			task 2			task 3
> 			nf_conntrack_find_get
> 			 ____nf_conntrack_find
> destroy_conntrack
>  hlist_nulls_del_rcu
>  nf_conntrack_free
>  kmem_cache_free
> 						__nf_conntrack_alloc
> 						 kmem_cache_alloc
> 						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
> 			 if (nf_ct_is_dying(ct))
> 			 if (!nf_ct_tuple_equal()
> 
> I'm not sure, that I have ever seen this race condition in a real life.
> Currently we are investigating a bug, which is reproduced on a few node.
> In our case one conntrack is initialized from a few tasks concurrently,
> we don't have any other explanation for this.
> 
> <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
> ...
> <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>]  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
> ...
> <4>[46267.085549] Call Trace:
> <4>[46267.085622]  [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
> <4>[46267.085697]  [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
> <4>[46267.085770]  [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
> <4>[46267.085843]  [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
> <4>[46267.085919]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
> <4>[46267.085991]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
> <4>[46267.086063]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
> <4>[46267.086133]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
> <4>[46267.086207]  [<ffffffff814b5890>] ? dst_output+0x0/0x20
> <4>[46267.086277]  [<ffffffff81495204>] ip_output+0xa4/0xc0
> <4>[46267.086346]  [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
> <4>[46267.086419]  [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
> <4>[46267.086491]  [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
> <4>[46267.086562]  [<ffffffff81444d67>] sock_sendmsg+0x117/0x140
> <4>[46267.086638]  [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
> <4>[46267.086712]  [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
> <4>[46267.086785]  [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
> <4>[46267.086858]  [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
> <4>[46267.086936]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
> <4>[46267.087006]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
> <4>[46267.087081]  [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
> <4>[46267.087151]  [<ffffffff81445599>] sys_sendto+0x139/0x190
> <4>[46267.087229]  [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
> <4>[46267.087303]  [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
> <4>[46267.087378]  [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
> <4>[46267.087454]  [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
> <4>[46267.087531]  [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
> <4>[46267.087607]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
> <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
> <1>[46267.088023] RIP  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590
> 
> v2: move nf_ct_is_confirmed into the unlikely() annotation
> 
> Cc: Florian Westphal <fw@...len.de>
> Cc: Pablo Neira Ayuso <pablo@...filter.org>
> Cc: Patrick McHardy <kaber@...sh.net>
> Cc: Jozsef Kadlecsik <kadlec@...ckhole.kfki.hu>
> Cc: "David S. Miller" <davem@...emloft.net>
> Cc: Cyrill Gorcunov <gorcunov@...nvz.org>
> Signed-off-by: Andrey Vagin <avagin@...nvz.org>
> ---
>  net/netfilter/nf_conntrack_core.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 43549eb..403f634 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -387,8 +387,12 @@ begin:
>  			     !atomic_inc_not_zero(&ct->ct_general.use)))
>  			h = NULL;
>  		else {
> +			/* A conntrack can be recreated with the equal tuple,
> +			 * so we need to check that the conntrack is initialized
> +			 */
>  			if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple) ||
> -				     nf_ct_zone(ct) != zone)) {
> +				     nf_ct_zone(ct) != zone ||
> +				     !nf_ct_is_confirmed(ct))) {
>  				nf_ct_put(ct);
>  				goto begin;
>  			}


I am still not convinced of this being the right fix.


The key we test after taking the refcount should be the same key that we
test before taking the refcount, otherwise we might add a loop here.

Your patch did not change ____nf_conntrack_find(), so I find it
confusing.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ