lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 20 May 2015 15:01:18 +0800
From:	Ying Xue <ying.xue@...driver.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	<netdev@...r.kernel.org>, <davem@...emloft.net>,
	<alexei@...estorage.com>, <joern@...estorage.com>, <ja@....bg>
Subject: Re: [PATCH v2] net: fix a double free issue for neighbour entry

On 05/20/2015 01:27 PM, Eric Dumazet wrote:
> Sorry, this atomic_read() makes no sense to me.
> 
> When rcu is used, this is perfectly fine to use an object which refcount
> is 0. If you believe the opposite, then point me to relevant
> Documentation/RCU/ file.
> 

However, the reality for us is that rcu_read_lock() can guarantee that a neigh
object is not freed within the area covered by rcu read lock, but it cannot
prevent neigh_destroy() from being executed in another context at the same time.

> refcount should only protect the object itself, not the use of it by a
> rcu reader. This has nothing to do with rcu but standard refcounting of
> objects.
> 
> The only forbidden thing would be to try to take a reference on it with
> atomic_inc() instead of the more careful atomic_inc_not_zero(), if the
> caller needs to exit the rcu_read_lock() section safely (as explained in
> Documentation/RCU/rcuref.txt around line 52)
> 
> So far, dst_neigh_output() will not try to take a refcnt.
> 

Please let us take a look at the following log generated by the Joern Engel
written debug patch:

[ 4974.816012]  [<ffffffff8163baae>] dump_stack+0x19/0x1b
[ 4974.816017]  [<ffffffff8103f141>] warn_slowpath_common+0x61/0x80
[ 4974.816019]  [<ffffffff8103f21a>] warn_slowpath_null+0x1a/0x20
[ 4974.816021]  [<ffffffff8163de95>] neigh_hold.part.26+0x1e/0x27
[ 4974.816027]  [<ffffffff8155f08c>] neigh_add_timer+0x3c/0x60
[ 4974.816029]  [<ffffffff8155f1ab>] __neigh_event_send+0xfb/0x220
[ 4974.816031]  [<ffffffff8155f40b>] neigh_resolve_output+0x13b/0x220
[ 4974.816038]  [<ffffffff8158c950>] ip_finish_output+0x1b0/0x3b0
[ 4974.816040]  [<ffffffff8158ddd8>] ip_output+0x58/0x90
[ 4974.816042]  [<ffffffff8158d5a5>] ip_local_out+0x25/0x30
[ 4974.816045]  [<ffffffff8158d8f7>] ip_queue_xmit+0x137/0x380
[ 4974.816051]  [<ffffffff815a47a5>] tcp_transmit_skb+0x445/0x8a0
[ 4974.816054]  [<ffffffff815a4d40>] tcp_write_xmit+0x140/0xb00
[ 4974.816058]  [<ffffffff815a59ae>] __tcp_push_pending_frames+0x2e/0xc0
[ 4974.816062]  [<ffffffff81597198>] tcp_sendmsg+0x118/0xd90
[ 4974.816070]  [<ffffffff81278b55>] ? debug_object_deactivate+0x115/0x170
[ 4974.816076]  [<ffffffff815bf434>] inet_sendmsg+0x64/0xb0
[ 4974.816080]  [<ffffffff8153da56>] sock_sendmsg+0x76/0x90
[ 4974.816086]  [<ffffffff81046e89>] ? local_bh_enable_ip+0x89/0xf0

Above stack trace log tells us the following call chain happens:
ip_finish_output()
  ->ip_finish_output2()
    ->dst_neigh_output()
      ->neigh_resolve_output()
        ->__neigh_event_send()
          ->neigh_add_timer()
            ->neigh_hold()

Moreover, the below debug code is added in Joern Engel written debug patch :
+static inline void neigh_hold(struct neighbour *neigh)
+{
+	WARN_ON_ONCE(atomic_inc_return(&neigh->refcnt) == 1);
+}

So, we can identify above stack trace was printed by WARN_ON_ONCE(), which also
proves that dst_neigh_output() not only tries to take a refcnt of a neigh, but
also it the refcnt of neigh entry that dst_neigh_output() will be touched was
already decremented to zero.

Please consider the below scenario which is very similar to above case met by
Joern Engel:

Time 1:
CPU 0:
neigh_release()
  ->neigh_destroy() //it's called as neigh's refcnt is 0 now
    ->kfree_rcu(neigh, rcu);//freeing neigh object is delayed after a RCU
grace period

Time 2:
CPU 1:
ip_finish_output2()
  rcu_read_lock_bh()
  dst_neigh_output()
   __neigh_event_send()
    neigh_add_timer()
      neigh_hold() //refcnt of neigh is increased from 0 to 1;
     rcu_read_unlock_bh()

Time 3:
CPU 0:
On RCU_SOFTIRQ context:
rcu_process_callbacks()
  neigh object is really freed!

Time 4:
CPU 0:
Release the neigh whose refcnt was increased from 0 to zero.
neigh_release() is called by someone. As the neigh is already freed, panic
happens!!!

This is why I initially added the condition statement in ip_finish_output2() to
check whether the refcnt of neigh returned by __ipv4_neigh_lookup_noref() is
zero or not with atomic_read(). But as you pointed out, atomic_read() was not
safe for us, but we cannot use other atomic routines with implicit memory
barriers as they all increase or decease refcnt while checking refcnt is 0,
meanwhile, we cannot use explicit memory barrier as it seems too expensive for
the hot path. So I try to add another condition to prevent neigh_add_timer()
called by __neigh_event_send() from being take a refcnt which is already zero.

> 
> By the time you check atomic_read() the result is meaningless if another
> cpu decrements the refcount to 0.
> 

If you think checking if refcnt is zero is meaningless in ip_finish_output2(),
why does __ipv4_neigh_lookup() use atomic_inc_not_zero() instead of directly
using neigh_hold() in __ipv4_neigh_lookup()?

Regards,
Ying

> So what is the point, trying to reduce a race window but not properly
> remove it ?
> 
> I repeat again : using atomic_read() in a rcu lookup is absolutely
> useless and is a clear sign you missed a fundamental issue. 
> 
> So now, we are back to the patch I initially sent, but you did not
> address Julian Anastasov feedback.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ