[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1448481002.24696.30.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Wed, 25 Nov 2015 11:50:02 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Rainer Weikusat <rweikusat@...ileactivedefense.com>
Cc: Eric Dumazet <edumazet@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Benjamin LaHaise <bcrl@...ck.org>,
"David S. Miller" <davem@...emloft.net>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
Al Viro <viro@...iv.linux.org.uk>,
David Howells <dhowells@...hat.com>,
Ying Xue <ying.xue@...driver.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
netdev <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
syzkaller <syzkaller@...glegroups.com>,
Kostya Serebryany <kcc@...gle.com>,
Alexander Potapenko <glider@...gle.com>,
Sasha Levin <sasha.levin@...cle.com>
Subject: Re: use-after-free in sock_wake_async
On Wed, 2015-11-25 at 19:38 +0000, Rainer Weikusat wrote:
> Eric Dumazet <eric.dumazet@...il.com> writes:
> > On Wed, 2015-11-25 at 18:24 +0000, Rainer Weikusat wrote:
> >> Eric Dumazet <eric.dumazet@...il.com> writes:
> >> > On Wed, 2015-11-25 at 17:30 +0000, Rainer Weikusat wrote:
> >> >
> >> >> In case this is wrong, it obviously implies that sk_sleep(sk) must not
> >> >> be used anywhere as it accesses the same struck sock, hence, when that
> >> >> can "suddenly" disappear despite locks are used in the way indicated
> >> >> above, there is now safe way to invoke that, either, as it just does a
> >> >> rcu_dereference_raw based on the assumption that the caller knows that
> >> >> the i-node (and the corresponding wait queue) still exist.
> >> >>
> >> >
> >> > Oh well.
> >> >
> >> > sk_sleep() is not used if the return is NULL
>
> [...]
>
> >> finish_wait(sk_sleep(sk), &wait);
> >> unix_state_unlock(sk);
> >> return timeo;
> >> }
> >>
> >> Neither prepare_to_wait nor finish_wait check if the pointer is
> >> null.
>
> [...]
>
> > You are looking at the wrong side.
> >
> > Of course, the thread 'owning' a socket has a reference on it, so it
> > knows sk->sk_socket and sk->sk_ww is not NULL.
>
> I could continue argueing about this but IMHO, it just leads away from
> the actual issue. Taking a couple of steps back, therefore,
>
> ------------
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 4e95bdf..5c87ea6 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1754,8 +1754,8 @@ restart_locked:
> skb_queue_tail(&other->sk_receive_queue, skb);
> if (max_level > unix_sk(other)->recursion_level)
> unix_sk(other)->recursion_level = max_level;
> - unix_state_unlock(other);
> other->sk_data_ready(other);
> + unix_state_unlock(other);
> sock_put(other);
> scm_destroy(&scm);
> return len;
> @@ -1860,8 +1860,8 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
> skb_queue_tail(&other->sk_receive_queue, skb);
> if (max_level > unix_sk(other)->recursion_level)
> unix_sk(other)->recursion_level = max_level;
> - unix_state_unlock(other);
> other->sk_data_ready(other);
> + unix_state_unlock(other);
> sent += size;
> }
>
> -------------
>
> I'm convinced this will work for the given problem (I don't claim that
> it's technically superior to the larger change in any aspect except that
> it's simpler and localized) because
>
> 1) The use-after-free occurred while accessing the struck sock allocated
> together with the socket inode.
>
> 2) This happened because a close was racing with a write.
>
> 3) The socket inode, struct sock and struct socket_wq are freed by
> sock_destroy_inode.
>
> 4) sock_destroy_inode can only be called as consequence of the iput in
> sock_release.
>
> 5) sock_release invokes the per-protocol/ family release function before
> doing the iput.
>
> 6) unix_sock_release has to acquire the unix_state_lock on the socket
> referred to as other in the code above before it can do anything, in
> particular, before it calls sock_orphan which resets the struct sock and wq
> pointers and also sets the SOCK_DEAD flag.
>
> 7) the unix_stream_sendmsg code acquires the unix_state_lock on other
> and then checks the SOCK_DEAD flag. The code above only runs if it
> was not set, hence, the iput in sock_release can't have happened yet
> because a concurrent unix_sock_release must still be blocked on the
> unix_state_lock of other.
>
> If there's an error in this reasoning, I'd very much like to know where
> and what it is, not the least because the unix_dgram_peer_wake_relay
> function I wrote also relies on it being correct (wrt to using the
> result of sk_sleep outside of rcu_read_lock/ rcu_read_unlock).
Yeah, we can either continue to patch particular buggy code, with hard
to review and debug stuff (this particular issue is not new)
OR, we add core infrastructure and we have less headaches, because it
works for all sockets, including af_unix.
My choice is pretty clear, having to maintain this code at Google, I can
tell you that solid core networking stack is much much better by all
means.
This af_unix code needs serious rewrite, we have absolute nightmares
with it (like the bugs added in recent sendpage() support).
I believe anything we can do to avoid having to make sure the points you
listed are going to be kept at next patch or code refactoring is a big
win.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists