netdev - Re: use-after-free in sock_wake

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87y4dl3m5c.fsf@doppelsaurus.mobileactivedefense.com>
Date:	Wed, 25 Nov 2015 19:38:07 +0000
From:	Rainer Weikusat <rweikusat@...ileactivedefense.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Rainer Weikusat <rweikusat@...ileactivedefense.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Dmitry Vyukov <dvyukov@...gle.com>,
	Benjamin LaHaise <bcrl@...ck.org>,
	"David S. Miller" <davem@...emloft.net>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	David Howells <dhowells@...hat.com>,
	Ying Xue <ying.xue@...driver.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	netdev <netdev@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	syzkaller <syzkaller@...glegroups.com>,
	Kostya Serebryany <kcc@...gle.com>,
	Alexander Potapenko <glider@...gle.com>,
	Sasha Levin <sasha.levin@...cle.com>
Subject: Re: use-after-free in sock_wake_async

Eric Dumazet <eric.dumazet@...il.com> writes:
> On Wed, 2015-11-25 at 18:24 +0000, Rainer Weikusat wrote:
>> Eric Dumazet <eric.dumazet@...il.com> writes:
>> > On Wed, 2015-11-25 at 17:30 +0000, Rainer Weikusat wrote:
>> >
>> >> In case this is wrong, it obviously implies that sk_sleep(sk) must not
>> >> be used anywhere as it accesses the same struck sock, hence, when that
>> >> can "suddenly" disappear despite locks are used in the way indicated
>> >> above, there is now safe way to invoke that, either, as it just does a
>> >> rcu_dereference_raw based on the assumption that the caller knows that
>> >> the i-node (and the corresponding wait queue) still exist.
>> >> 
>> >
>> > Oh well.
>> >
>> > sk_sleep() is not used if the return is NULL

[...]

>> 	finish_wait(sk_sleep(sk), &wait);
>> 	unix_state_unlock(sk);
>> 	return timeo;
>> }
>> 
>> Neither prepare_to_wait nor finish_wait check if the pointer is
>> null.

[...]

> You are looking at the wrong side.
>
> Of course, the thread 'owning' a socket has a reference on it, so it
> knows sk->sk_socket and sk->sk_ww is not NULL.

I could continue argueing about this but IMHO, it just leads away from
the actual issue. Taking a couple of steps back, therefore,

------------
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4e95bdf..5c87ea6 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1754,8 +1754,8 @@ restart_locked:
        skb_queue_tail(&other->sk_receive_queue, skb);
        if (max_level > unix_sk(other)->recursion_level)
                unix_sk(other)->recursion_level = max_level;
-       unix_state_unlock(other);
        other->sk_data_ready(other);
+       unix_state_unlock(other);
        sock_put(other);
        scm_destroy(&scm);
        return len;
@@ -1860,8 +1860,8 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
                skb_queue_tail(&other->sk_receive_queue, skb);
                if (max_level > unix_sk(other)->recursion_level)
                        unix_sk(other)->recursion_level = max_level;
-               unix_state_unlock(other);
                other->sk_data_ready(other);
+               unix_state_unlock(other);
                sent += size;
        }
 
-------------

I'm convinced this will work for the given problem (I don't claim that
it's technically superior to the larger change in any aspect except that
it's simpler and localized) because

1) The use-after-free occurred while accessing the struck sock allocated
   together with the socket inode.

2) This happened because a close was racing with a write.   

3) The socket inode, struct sock and struct socket_wq are freed by
   sock_destroy_inode.

4) sock_destroy_inode can only be called as consequence of the iput in
   sock_release.

5) sock_release invokes the per-protocol/ family release function before
   doing the iput.

6) unix_sock_release has to acquire the unix_state_lock on the socket
   referred to as other in the code above before it can do anything, in
   particular, before it calls sock_orphan which resets the struct sock and wq
   pointers and also sets the SOCK_DEAD flag.

7) the unix_stream_sendmsg code acquires the unix_state_lock on other
   and then checks the SOCK_DEAD flag. The code above only runs if it
   was not set, hence, the iput in sock_release can't have happened yet
   because a concurrent unix_sock_release must still be blocked on the
   unix_state_lock of other.

If there's an error in this reasoning, I'd very much like to know where
and what it is, not the least because the unix_dgram_peer_wake_relay
function I wrote also relies on it being correct (wrt to using the
result of sk_sleep outside of rcu_read_lock/ rcu_read_unlock).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html