lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20250604010524.6091-32-sashal@kernel.org>
Date: Tue,  3 Jun 2025 21:05:23 -0400
From: Sasha Levin <sashal@...nel.org>
To: patches@...ts.linux.dev,
	stable@...r.kernel.org
Cc: Jiayuan Chen <jiayuan.chen@...ux.dev>,
	Alexei Starovoitov <ast@...nel.org>,
	Sasha Levin <sashal@...nel.org>,
	john.fastabend@...il.com,
	jakub@...udflare.com,
	netdev@...r.kernel.org,
	bpf@...r.kernel.org
Subject: [PATCH AUTOSEL 5.15 32/33] bpf, sockmap: Fix data lost during EAGAIN retries

From: Jiayuan Chen <jiayuan.chen@...ux.dev>

[ Upstream commit 7683167196bd727ad5f3c3fc6a9ca70f54520a81 ]

We call skb_bpf_redirect_clear() to clean _sk_redir before handling skb in
backlog, but when sk_psock_handle_skb() return EAGAIN due to sk_rcvbuf
limit, the redirect info in _sk_redir is not recovered.

Fix skb redir loss during EAGAIN retries by restoring _sk_redir
information using skb_bpf_set_redir().

Before this patch:
'''
./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
Setting up benchmark 'sockmap'...
create socket fd c1:13 p1:14 c2:15 p2:16
Benchmark 'sockmap' started.
Send Speed 1343.172 MB/s, BPF Speed 1343.238 MB/s, Rcv Speed   65.271 MB/s
Send Speed 1352.022 MB/s, BPF Speed 1352.088 MB/s, Rcv Speed   0 MB/s
Send Speed 1354.105 MB/s, BPF Speed 1354.105 MB/s, Rcv Speed   0 MB/s
Send Speed 1355.018 MB/s, BPF Speed 1354.887 MB/s, Rcv Speed   0 MB/s
'''
Due to the high send rate, the RX processing path may frequently hit the
sk_rcvbuf limit. Once triggered, incorrect _sk_redir will cause the flow
to mistakenly enter the "!ingress" path, leading to send failures.
(The Rcv speed depends on tcp_rmem).

After this patch:
'''
./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
Setting up benchmark 'sockmap'...
create socket fd c1:13 p1:14 c2:15 p2:16
Benchmark 'sockmap' started.
Send Speed 1347.236 MB/s, BPF Speed 1347.367 MB/s, Rcv Speed   65.402 MB/s
Send Speed 1353.320 MB/s, BPF Speed 1353.320 MB/s, Rcv Speed   65.536 MB/s
Send Speed 1353.186 MB/s, BPF Speed 1353.121 MB/s, Rcv Speed   65.536 MB/s
'''

Signed-off-by: Jiayuan Chen <jiayuan.chen@...ux.dev>
Link: https://lore.kernel.org/r/20250407142234.47591-2-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@...nel.org>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---

**YES** This commit should be backported to stable kernel trees. ##
Extensive Analysis ### **Critical Bug Fix Nature** This is a **data loss
bug** in the BPF sockmap subsystem that causes silent packet drops under
load conditions. The commit adds a single line that restores essential
redirect information during EAGAIN retry scenarios. ### **Code Change
Analysis** The fix adds exactly one line at `net/core/skmsg.c:694`: ```c
skb_bpf_set_redir(skb, psock->sk, ingress); ``` This line restores
redirect information that was previously cleared by
`skb_bpf_redirect_clear(skb)` at line 684. When `sk_psock_handle_skb()`
returns `-EAGAIN` due to memory pressure or socket buffer limits, the
work is rescheduled, but without this fix, the redirect information
(`skb->_sk_redir`) would be lost. ### **Impact Without the Fix** 1.
**Silent Data Loss**: When EAGAIN occurs, subsequent retry attempts find
no redirect target (`skb_bpf_redirect_fetch()` returns NULL), causing
packets to be dropped in `sk_psock_skb_redirect()` at lines 935-938. 2.
**BPF Policy Bypass**: Packets that should be redirected according to
BPF program logic get dropped instead, effectively bypassing network
policies. 3. **Production Impact**: The commit message shows concrete
performance degradation - receive speeds dropping from 65+ MB/s to 0
MB/s under high load conditions. ### **Backport Suitability Factors**
**✅ Minimal and Surgical:** - Single line addition - No API changes or
structural modifications - Uses existing stable function
`skb_bpf_set_redir()` **✅ Self-Contained:** - Fix only affects the
specific error path (`ret == -EAGAIN`) - No dependencies on other
concurrent changes - Uses well-established APIs present across kernel
versions **✅ Clear Bug Fix Semantics:** - Restores state that was
previously cleared - Follows the established pattern: clear → try →
restore on failure - The comment explicitly states "Restore redir info
we cleared before" **✅ Critical Subsystem:** - Affects BPF sockmap, a
core networking infrastructure component - Used by service meshes,
container networking, and load balancers - Failure causes silent data
loss that's difficult to debug ### **Comparison with Similar Commits**
Looking at the provided historical examples: - Similar to commit #2 and
#4 (both marked YES) which also fix sockmap data handling issues -
Unlike commit #1, #3, and #5 (marked NO) which involved more complex
architectural changes - This fix addresses a fundamental correctness
issue rather than optimizations ### **Risk Assessment** **Low Risk:** -
The fix is in an error recovery path, so it only executes when problems
already exist - Restoring redirect information cannot make the situation
worse - The function `skb_bpf_set_redir()` is a simple state restoration
operation ### **Stable Tree Criteria Compliance** 1. **Important
bugfix**: ✅ Fixes silent data loss 2. **Minimal risk**: ✅ Single line,
error path only 3. **No new features**: ✅ Pure bug fix 4. **Confined
scope**: ✅ Limited to sockmap redirect handling 5. **User-visible
impact**: ✅ Prevents packet loss under load This commit perfectly fits
the stable tree criteria for important, low-risk bug fixes that address
user-visible problems in critical subsystems.

 net/core/skmsg.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index a5947aa559837..3ae1704a8a7c1 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -663,7 +663,8 @@ static void sk_psock_backlog(struct work_struct *work)
 			if (ret <= 0) {
 				if (ret == -EAGAIN) {
 					sk_psock_skb_state(psock, state, len, off);
-
+					/* Restore redir info we cleared before */
+					skb_bpf_set_redir(skb, psock->sk, ingress);
 					/* Delay slightly to prioritize any
 					 * other work that might be here.
 					 */
-- 
2.39.5


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ