netdev - [bpf PATCH 4/5] bpf, sockmap: Handle memory acct if skb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <160477793403.608263.17626285322866020367.stgit@john-XPS-13-9370>
Date:   Sat, 07 Nov 2020 11:38:54 -0800
From:   John Fastabend <john.fastabend@...il.com>
To:     ast@...nel.org, daniel@...earbox.net, jakub@...udflare.com
Cc:     john.fastabend@...il.com, bpf@...r.kernel.org,
        netdev@...r.kernel.org
Subject: [bpf PATCH 4/5] bpf,
 sockmap: Handle memory acct if skb_verdict prog redirects to self

If the skb_verdict_prog redirects an skb knowingly to itself, fix your
BPF program this is not optimal and an abuse of the API please use
SK_PASS. That said there may be cases, such as socket load balancing,
where picking the socket is hashed based or otherwise picks the same
socket it was received on in some rare cases. If this happens we don't
want to confuse userspace giving them an EAGAIN error if we can avoid
it.

To avoid double accounting in these cases. At the moment even if the
skb has already been charged against the sockets rcvbuf and forward
alloc we check it again and do set_owner_r() causing it to be orphaned
and recharged. For one this is useless work, but more importantly we
can have a case where the skb could be put on the ingress queue, but
because we are under memory pressure we return EAGAIN. The trouble
here is the skb has already been accounted for so any rcvbuf checks
include the memory associated with the packet already. This rolls
up and can result in unecessary EAGAIN errors in userspace read()
calls.

Fix by doing an unlikely check and skipping checks if skb->sk == sk.

Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@...il.com>
---
 net/core/skmsg.c |   17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 580252e532da..59c36a672256 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -404,11 +404,13 @@ static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk,
 {
 	struct sk_msg *msg;

-	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
-		return NULL;
+	if (likely(skb->sk != sk)) {
+		if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
+			return NULL;

-	if (!sk_rmem_schedule(sk, skb, skb->len))
-		return NULL;
+		if (!sk_rmem_schedule(sk, skb, skb->len))
+			return NULL;
+	}

 	msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC);
 	if (unlikely(!msg))
@@ -455,9 +457,12 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb)
 	 * the BPF program was run initiating the redirect to the socket
 	 * we will eventually receive this data on. The data will be released
 	 * from skb_consume found in __tcp_bpf_recvmsg() after its been copied
-	 * into user buffers.
+	 * into user buffers. If we are receiving on the same sock skb->sk is
+	 * already assigned, skip memory accounting and owner transition seeing
+	 * it already set correctly.
 	 */
-	skb_set_owner_r(skb, sk);
+	if (likely(skb->sk != sk))
+		skb_set_owner_r(skb, sk);
 	return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg);
 }