lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7d12a5fb-f923-4176-901a-8dc967eda52e@kernel.org>
Date: Wed, 5 Nov 2025 15:40:03 +0100
From: Matthieu Baerts <matttbe@...nel.org>
To: Jiayuan Chen <jiayuan.chen@...ux.dev>, mptcp@...ts.linux.dev
Cc: stable@...r.kernel.org, Jakub Sitnicki <jakub@...udflare.com>,
 Mat Martineau <martineau@...nel.org>, Geliang Tang <geliang@...nel.org>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, Alexei Starovoitov <ast@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
 Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman
 <eddyz87@...il.com>, Song Liu <song@...nel.org>,
 Yonghong Song <yonghong.song@...ux.dev>,
 John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
 Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
 Jiri Olsa <jolsa@...nel.org>, Shuah Khan <shuah@...nel.org>,
 Florian Westphal <fw@...len.de>, linux-kernel@...r.kernel.org,
 netdev@...r.kernel.org, bpf@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with
 BPF

Hi Jiayuan,

On 05/11/2025 12:36, Jiayuan Chen wrote:

If you need to send a v5, please remove the 'net,' prefix from the
title. And maybe good to mention 'sockmap', e.g.

  mptcp: fix proto fallback detection with sockmap

> The sockmap feature allows bpf syscall from userspace, or based
> on bpf sockops, replacing the sk_prot of sockets during protocol stack
> processing with sockmap's custom read/write interfaces.
> '''
> tcp_rcv_state_process()
>   syn_recv_sock()/subflow_syn_recv_sock()
>     tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
>       bpf_skops_established       <== sockops
>         bpf_sock_map_update(sk)   <== call bpf helper
>           tcp_bpf_update_proto()  <== update sk_prot
> '''
> 
> When the server has MPTCP enabled but the client sends a TCP SYN
> without MPTCP, subflow_syn_recv_sock() performs a fallback on the
> subflow, replacing the subflow sk's sk_prot with the native sk_prot.
> '''
> subflow_syn_recv_sock()
>   subflow_ulp_fallback()
>     subflow_drop_ctx()
>       mptcp_subflow_ops_undo_override()
> '''
> 
> Then, this subflow can be normally used by sockmap, which replaces the
> native sk_prot with sockmap's custom sk_prot. The issue occurs when the
> user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops().
> Here, it uses sk->sk_prot to compare with the native sk_prot, but this
> is incorrect when sockmap is used, as we may incorrectly set
> sk->sk_socket->ops.
> 
> This fix uses the more generic sk_family for the comparison instead.
> 
> Additionally, this also prevents a WARNING from occurring:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 \
> mptcp_stream_accept+0x34c/0x380
> Modules linked in:
> RIP: 0010:mptcp_stream_accept+0x34c/0x380
> RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  do_accept+0xeb/0x190
>  ? __x64_sys_pselect6+0x61/0x80
>  ? _raw_spin_unlock+0x12/0x30
>  ? alloc_fd+0x11e/0x190
>  __sys_accept4+0x8c/0x100
>  __x64_sys_accept+0x1f/0x30
>  x64_sys_call+0x202f/0x20f0
>  do_syscall_64+0x72/0x9a0
>  ? switch_fpu_return+0x60/0xf0
>  ? irqentry_exit_to_user_mode+0xdb/0x1e0
>  ? irqentry_exit+0x3f/0x50
>  ? clear_bhb_loop+0x50/0xa0
>  ? clear_bhb_loop+0x50/0xa0
>  ? clear_bhb_loop+0x50/0xa0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  </TASK>
> ---[ end trace 0000000000000000 ]---
> 
> result from ./scripts/decode_stacktrace.sh:
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 337 at net/mptcp/protocol.c:68 mptcp_stream_accept \
> (net-next/net/mptcp/protocol.c:4005)
> Modules linked in:
> ...
> 
> PKRU: 55555554
> Call Trace:
> <TASK>
> do_accept (net-next/net/socket.c:1989)
> __sys_accept4 (net-next/net/socket.c:2028 net-next/net/socket.c:2057)
> __x64_sys_accept (net-next/net/socket.c:2067)
> x64_sys_call (net-next/arch/x86/entry/syscall_64.c:41)
> do_syscall_64 (net-next/arch/x86/entry/syscall_64.c:63 \
> net-next/arch/x86/entry/syscall_64.c:94)
> entry_SYSCALL_64_after_hwframe (net-next/arch/x86/entry/entry_64.S:130)
> RIP: 0033:0x7f87ac92b83d
> 
> ---[ end trace 0000000000000000 ]---

If you need to send a v5, please remove the non-decoded stacktrace, only
the decoded one is interesting. You can also remove the 'net-next/'
prefix in the paths. So only to keep 'net/mptcp/protocol.c:4005' for
example.

> 
> Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
> Cc: <stable@...r.kernel.org>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@...ux.dev>
> Reviewed-by: Jakub Sitnicki <jakub@...udflare.com>
> ---
>  net/mptcp/protocol.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 4cd5df01446e..b5e5e130b158 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -61,11 +61,13 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
>  
>  static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
>  {
> +	unsigned short family = READ_ONCE(sk->sk_family);
> +
>  #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> -	if (sk->sk_prot == &tcpv6_prot)
> +	if (family == AF_INET6)
>  		return &inet6_stream_ops;
>  #endif
> -	WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
> +	WARN_ON_ONCE(family != AF_INET);

I wonder if it would be interesting to return NULL if the family is not
AF_INET{,6}. But I guess this will cause a crash quickly after, no?

If yes, probably better to continue returning &inet_stream_ops here.

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@...nel.org>

>  	return &inet_stream_ops;
>  }
>  

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ