netdev - Re: [PATCH v2] net: fix a double free issue for neighbour entry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.11.1505202103390.1755@ja.home.ssi.bg>
Date:	Wed, 20 May 2015 22:19:49 +0300 (EEST)
From:	Julian Anastasov <ja@....bg>
To:	Ying Xue <ying.xue@...driver.com>
cc:	Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org,
	davem@...emloft.net, alexei@...estorage.com, joern@...estorage.com
Subject: Re: [PATCH v2] net: fix a double free issue for neighbour entry


	Hello,

On Wed, 20 May 2015, Ying Xue wrote:

> Yes, this way is absolutely safe for us! But, for example, if we check refcnt==0
> in write_lock protection, the checking is a bit late. Instead, if the checking
> is advanced to ip_finish_output2(), we can _early_ find the fact that we cannot
> use the neigh entry looked up by __ipv4_neigh_lookup_noref(). Of course, doing
> the check with atomic_read() is unsafe _really_, but once atomic_read() returned
> value tells us neigh refcnt is zero, the result is absolutely true. This is

	The problem is that this atomic_read is just an
extra code that works in rare cases. Modern CPUs have
128-byte cache lines and 'dead', 'refcnt' and 'next'
can be in same cache line. If you take into account the
write-back write policy used by CPU caches and the cache
locking effects, in most of the cases when such reader
finds entry in list, the refcnt will be > 0 and dead
will be 0. The reason: write-back policy makes the whole
cache line public to other CPUs at once. So, without
explicit barriers other CPUs can see data after delay and the
order of stores in same cache line is not observed.
As result, pure reader can not see 0 in refcnt, even if you
put more atomic_read calls at this place. atomic_read contains
only hints to compiler.

	So, we must ensure that if a writer decides to
remove entry other writers should not change things.
More observations for the writers:

- neigh_lookup is initially a reader bur it can increase refcnt
at any time, even while another CPU is under write_lock.
This makes all refcnt checks in neigh_forced_gc,
neigh_flush_dev and neigh_periodic_work racy because
neigh_cleanup_and_release is called after write_unlock.
If a writer decides to remove the entry, other write
operations may try to modify the entry because they
rely on their own reference. What can happen is that
refcnt is 1 when it is decided to remove the entry but
the refcnt can be increased by concurrent writers
such as neigh_lookup callers.

	As result, neigh_lookup followed by neigh_update may
work with dead=1 entry. May be the fix is to add
check for dead flag in neigh_update to avoid modifications
for removed entry, after the both write_lock_bh calls.
This again happens later, i.e. __neigh_lookup calls
neigh_lookup (which does not notice the dead flag early),
we miss to call neigh_create and then neigh_update will fail
on dead==1.

- neigh_timer_handler looks safe, may be it does not
need check for dead flag because if neigh_flush_dev calls
neigh_del_timer and del_timer in neigh_del_timer fails
with 0 (due to other CPU stuck at write_lock in
neigh_timer_handler) the 'if (!(state & NUD_IN_TIMER))'
check in neigh_timer_handler should succeed because
neigh_flush_dev will change nud_state on success of the
'if (atomic_read(&n->refcnt) != 1)' check.

- __neigh_event_send: needs check for dead flag, as
already discussed

Regards

--
Julian Anastasov <ja@....bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html