[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3eb50302-d90c-4477-b296-f5f29a7d1eca@linux.dev>
Date: Thu, 22 May 2025 12:25:10 -0700
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Jiayuan Chen <jiayuan.chen@...ux.dev>
Cc: bpf@...r.kernel.org, Michal Luczaj <mhal@...x.co>,
John Fastabend <john.fastabend@...il.com>,
Jakub Sitnicki <jakub@...udflare.com>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>,
Thadeu Lima de Souza Cascardo <cascardo@...lia.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v6] bpf, sockmap: avoid using sk_socket after
free when sending
On 5/16/25 7:17 AM, Jiayuan Chen wrote:
> The sk->sk_socket is not locked or referenced in backlog thread, and
> during the call to skb_send_sock(), there is a race condition with
> the release of sk_socket. All types of sockets(tcp/udp/unix/vsock)
> will be affected.
>
> Race conditions:
> '''
> CPU0 CPU1
>
> backlog::skb_send_sock
> sendmsg_unlocked
> sock_sendmsg
> sock_sendmsg_nosec
> close(fd):
> ...
> ops->release() -> sock_map_close()
> sk_socket->ops = NULL
> free(socket)
> sock->ops->sendmsg
> ^
> panic here
> '''
>
> The ref of psock become 0 after sock_map_close() executed.
> '''
> void sock_map_close()
> {
> ...
> if (likely(psock)) {
> ...
> // !! here we remove psock and the ref of psock become 0
> sock_map_remove_links(sk, psock)
> psock = sk_psock_get(sk);
> if (unlikely(!psock))
> goto no_psock; <=== Control jumps here via goto
> ...
> cancel_delayed_work_sync(&psock->work); <=== not executed
> sk_psock_put(sk, psock);
> ...
> }
> '''
>
> Based on the fact that we already wait for the workqueue to finish in
> sock_map_close() if psock is held, we simply increase the psock
> reference count to avoid race conditions.
>
> With this patch, if the backlog thread is running, sock_map_close() will
> wait for the backlog thread to complete and cancel all pending work.
>
> If no backlog running, any pending work that hasn't started by then will
> fail when invoked by sk_psock_get(), as the psock reference count have
> been zeroed, and sk_psock_drop() will cancel all jobs via
> cancel_delayed_work_sync().
>
> In summary, we require synchronization to coordinate the backlog thread
> and close() thread.
>
> The panic I catched:
> '''
> Workqueue: events sk_psock_backlog
> RIP: 0010:sock_sendmsg+0x21d/0x440
> RAX: 0000000000000000 RBX: ffffc9000521fad8 RCX: 0000000000000001
> ...
> Call Trace:
> <TASK>
> ? die_addr+0x40/0xa0
> ? exc_general_protection+0x14c/0x230
> ? asm_exc_general_protection+0x26/0x30
> ? sock_sendmsg+0x21d/0x440
> ? sock_sendmsg+0x3e0/0x440
> ? __pfx_sock_sendmsg+0x10/0x10
> __skb_send_sock+0x543/0xb70
> sk_psock_backlog+0x247/0xb80
> ...
> '''
>
> Reported-by: Michal Luczaj <mhal@...x.co>
> Fixes: 4b4647add7d3 ("sock_map: avoid race between sock_map_close and sk_psock_put")
> Signed-off-by: Jiayuan Chen <jiayuan.chen@...ux.dev>
>
> ---
> V5 -> V6: Use correct "Fixes" tag.
> V4 -> V5:
> This patch is extracted from my previous v4 patchset that contained
> multiple fixes, and it remains unchanged. Since this fix is relatively
> simple and easy to review, we want to separate it from other fixes to
> avoid any potential interference.
> ---
> net/core/skmsg.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 276934673066..34c51eb1a14f 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -656,6 +656,13 @@ static void sk_psock_backlog(struct work_struct *work)
> bool ingress;
> int ret;
>
> + /* Increment the psock refcnt to synchronize with close(fd) path in
> + * sock_map_close(), ensuring we wait for backlog thread completion
> + * before sk_socket freed. If refcnt increment fails, it indicates
> + * sock_map_close() completed with sk_socket potentially already freed.
> + */
> + if (!sk_psock_get(psock->sk))
This seems to be the first use case to pass "psock->sk" to "sk_psock_get()".
I could have missed the sock_map details here. Considering it is racing with
sock_map_close() which should also do a sock_put(sk) [?],
could you help to explain what makes it safe to access the psock->sk here?
> + return;
> mutex_lock(&psock->work_mutex);
> while ((skb = skb_peek(&psock->ingress_skb))) {
> len = skb->len;
> @@ -708,6 +715,7 @@ static void sk_psock_backlog(struct work_struct *work)
> }
> end:
> mutex_unlock(&psock->work_mutex);
> + sk_psock_put(psock->sk, psock);
> }
>
> struct sk_psock *sk_psock_init(struct sock *sk, int node)
Powered by blists - more mailing lists