netdev - [Patch bpf] skmsg: check sk_rcvbuf limit before queuing to ingress

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210629062029.13684-1-xiyou.wangcong@gmail.com>
Date:   Mon, 28 Jun 2021 23:20:29 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     netdev@...r.kernel.org
Cc:     bpf@...r.kernel.org, Cong Wang <cong.wang@...edance.com>,
        Jiang Wang <jiang.wang@...edance.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        John Fastabend <john.fastabend@...il.com>,
        Lorenz Bauer <lmb@...udflare.com>,
        Jakub Sitnicki <jakub@...udflare.com>
Subject: [Patch bpf] skmsg: check sk_rcvbuf limit before queuing to ingress_skb

From: Cong Wang <cong.wang@...edance.com>

Jiang observed OOM frequently when testing our AF_UNIX/UDP
proxy. This is due to the fact that we do not actually limit
the socket memory before queueing skb to ingress_skb. We
charge the skb memory later when handling the psock backlog,
but it is not limited either.

This patch adds checks for sk->sk_rcvbuf right before queuing
to ingress_skb and drops packets if this limit exceeds. This
is very similar to UDP receive path. Ideally we should set the
skb owner before this check too, but it is hard to make TCP
happy about sk_forward_alloc.

Reported-by: Jiang Wang <jiang.wang@...edance.com>
Cc: Daniel Borkmann <daniel@...earbox.net>
Cc: John Fastabend <john.fastabend@...il.com>
Cc: Lorenz Bauer <lmb@...udflare.com>
Cc: Jakub Sitnicki <jakub@...udflare.com>
Signed-off-by: Cong Wang <cong.wang@...edance.com>
---
 net/core/skmsg.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 9b6160a191f8..83b581d8023d 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -854,7 +854,8 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
 		return -EIO;
 	}
 	spin_lock_bh(&psock_other->ingress_lock);
-	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
+	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED) ||
+	    atomic_read(&sk_other->sk_rmem_alloc) > sk_other->sk_rcvbuf) {
 		spin_unlock_bh(&psock_other->ingress_lock);
 		skb_bpf_redirect_clear(skb);
 		sock_drop(from->sk, skb);
@@ -930,7 +931,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
 		}
 		if (err < 0) {
 			spin_lock_bh(&psock->ingress_lock);
-			if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
+			if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED) &&
+			    atomic_read(&sk_other->sk_rmem_alloc) <= sk_other->sk_rcvbuf) {
 				skb_queue_tail(&psock->ingress_skb, skb);
 				schedule_work(&psock->work);
 				err = 0;
-- 
2.27.0