[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211208051633.49122-1-kuniyu@amazon.co.jp>
Date:   Wed, 8 Dec 2021 14:16:33 +0900
From:   Kuniyuki Iwashima <kuniyu@...zon.co.jp>
To:     "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Eric Dumazet <edumazet@...gle.com>
CC:     Benjamin Herrenschmidt <benh@...zon.com>,
        Kuniyuki Iwashima <kuniyu@...zon.co.jp>,
        Kuniyuki Iwashima <kuni1840@...il.com>,
        <netdev@...r.kernel.org>
Subject: [PATCH net] tcp: Remove sock_owned_by_user() test in tcp_child_process().
While creating a child socket, before v2.3.41, we used to call
bh_lock_sock() later than now; it was called just before
tcp_rcv_state_process().  The full socket was put into an accept queue
and exposed to other CPUs before bh_lock_sock() so that process context
might have acquired the lock by then.  Thus, we had to check if any
process context was accessing the socket before tcp_rcv_state_process().
We can see this code in tcp_v4_do_rcv() of v2.3.14. [0]
	if (sk->state == TCP_LISTEN) {
		struct sock *nsk;
		nsk = tcp_v4_hnd_req(sk, skb);
		...
		if (nsk != sk) {
			bh_lock_sock(nsk);
			if (nsk->lock.users != 0) {
				...
				sk_add_backlog(nsk, skb);
				bh_unlock_sock(nsk);
				return 0;
			}
			...
		}
	}
	if (tcp_rcv_state_process(sk, skb, skb->h.th, skb->len))
		goto reset;
However, in 2.3.15, this lock.users test was replaced with BUG_TRAP() by
mistake. [1]
		if (nsk != sk) {
			...
			BUG_TRAP(nsk->lock.users == 0);
			...
			ret = tcp_rcv_state_process(nsk, skb, skb->h.th, skb->len);
			...
			bh_unlock_sock(nsk);
			...
			return 0;
		}
Fortunately, the test was back in 2.3.41. [2]  Then, related code was
packed into tcp_child_process() with comments, which remains until now.
What is interesting in v2.3.41 is that the bh_lock_sock() was moved to
tcp_create_openreq_child() and placed just after sock_lock_init().
Thus, the lock is never acquired until tcp_rcv_state_process() by process
contexts.  The bh_lock_sock() is now in sk_clone_lock() and the rule does
not change.
As of now, alas, it is not possible to reach the commented path by the
change.  Let's remove the remnant of the old days.
[0]: https://cdn.kernel.org/pub/linux/kernel/v2.3/linux-2.3.14.tar.gz
[1]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.15.gz
[2]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.41.gz
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.co.jp>
---
 net/ipv4/tcp_minisocks.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 7c2d3ac2363a..b4a1f8728093 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -833,18 +833,12 @@ int tcp_child_process(struct sock *parent, struct sock *child,
 	sk_mark_napi_id_set(child, skb);
 
 	tcp_segs_in(tcp_sk(child), skb);
-	if (!sock_owned_by_user(child)) {
-		ret = tcp_rcv_state_process(child, skb);
-		/* Wakeup parent, send SIGIO */
-		if (state == TCP_SYN_RECV && child->sk_state != state)
-			parent->sk_data_ready(parent);
-	} else {
-		/* Alas, it is possible again, because we do lookup
-		 * in main socket hash table and lock on listening
-		 * socket does not protect us more.
-		 */
-		__sk_add_backlog(child, skb);
-	}
+
+	ret = tcp_rcv_state_process(child, skb);
+
+	/* Wakeup parent, send SIGIO */
+	if (state == TCP_SYN_RECV && child->sk_state != state)
+		parent->sk_data_ready(parent);
 
 	bh_unlock_sock(child);
 	sock_put(child);
-- 
2.30.2
Powered by blists - more mailing lists
 
