[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3362017f-9c3d-46cd-b3ce-cb750b565d5b@rbox.co>
Date: Fri, 30 Jan 2026 12:00:09 +0100
From: Michal Luczaj <mhal@...x.co>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: John Fastabend <john.fastabend@...il.com>,
Jakub Sitnicki <jakub@...udflare.com>, Kuniyuki Iwashima
<kuniyu@...gle.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto
update
On 1/29/26 20:41, Martin KaFai Lau wrote:
> On 1/29/26 8:47 AM, Michal Luczaj wrote:
>> BPF_MAP_UPDATE_ELEM races unix_stream_connect(): when
>> sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED),
>> unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL.
>>
>> T0 bpf T1 connect
>> ------ ----------
>>
>> WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
>> sock_map_sk_state_allowed(sk)
>> ...
>> sk_pair = unix_peer(sk)
>> sock_hold(sk_pair)
>> sock_hold(newsk)
>> smp_mb__after_atomic()
>> unix_peer(sk) = newsk
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000080
>> RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
>> Call Trace:
>> sock_map_link+0x564/0x8b0
>> sock_map_update_common+0x6e/0x340
>> sock_map_update_elem_sys+0x17d/0x240
>> __sys_bpf+0x26db/0x3250
>> __x64_sys_bpf+0x21/0x30
>> do_syscall_64+0x6b/0x3a0
>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> Follow-up to discussion at
>> https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/.
>
> It is a long thread to dig. Please summarize the discussion in the
> commit message.
OK, there we go:
The root cause of the null-ptr-deref is that unix_stream_connect() sets
sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns
a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes
sock_map_sk_state_allowed() believe that socket is properly set up, which
would include having a defined peer.
In other words, there's a window when you can call
unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL.
My initial idea was to simply move peer assignment _before_ the sk_state
update, but the maintainer wasn't interested in changing the
unix_stream_connect() hot path. He suggested taking care of it in the
sockmap code.
My understanding is that users are not supposed to put sockets in a sockmap
when said socket is only half-way through connect() call. Hence `return
-EINVAL` on a missing peer. Now, if users should be allowed to legally race
connect() vs. sockmap update, then I guess we can wait for connect() to
"finalize" e.g. by taking the unix_state_lock(), as discussed below.
> From looking at this commit message, if the existing lock_sock held by
> update_elem is not useful for af_unix,
Right, the existing lock_sock is not useful. update's lock_sock holds
sock::sk_lock, while unix_state_lock() holds unix_sock::lock.
> it is not clear why a new test
> "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix.
"On top"? Just to make sure we're looking at the same thing: above I was
trying to show two parallel flows with unix_peer() fetch in thread-0 and
WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1.
It fixes the problem because now update_proto won't call sock_hold(NULL).
> A minor thing is sock_map_sk_state_allowed doesn't have
> READ_ONCE(sk->sk_state) for sk_is_stream_unix also.
Ok, I'll add this as a separate patch in v2. Along with the !tcp case of
sock_map_redirect_allowed()?
> If unix_stream_connect does not hold lock_sock, can unix_state_lock be
> used here? lock_sock has already been taken, update_elem should not be
> the hot path.
Yes, it can be used, it was proposed in the old thread. In fact, critical
section can be empty; only used to wait for unix_stream_connect() to
release the lock, which would guarantee unix_peer(sk) != NULL by then.
if (!psock->sk_pair) {
+ unix_state_lock(sk);
+ unix_state_unlock(sk);
sk_pair = unix_peer(sk);
sock_hold(sk_pair);
>> Fixes: 8866730aed51 ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock")
>> Suggested-by: Kuniyuki Iwashima <kuniyu@...gle.com>
>> Signed-off-by: Michal Luczaj <mhal@...x.co>
>> ---
>> Re-triggered while working on an unrelated selftest:
>> https://lore.kernel.org/bpf/20260123-selftest-signal-on-connect-v1-0-b0256e7025b6@rbox.co/
>> ---
>> net/unix/unix_bpf.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
>> index e0d30d6d22ac..57f3124c9d8d 100644
>> --- a/net/unix/unix_bpf.c
>> +++ b/net/unix/unix_bpf.c
>> @@ -185,6 +185,9 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r
>> */
>> if (!psock->sk_pair) {
>> sk_pair = unix_peer(sk);
>> + if (unlikely(!sk_pair))
>> + return -EINVAL;
>> +
>> sock_hold(sk_pair);
>> psock->sk_pair = sk_pair;
>> }
>>
>> ---
>> base-commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
>> change-id: 20260129-unix-proto-update-null-ptr-deref-6a2733bcbbf8
>>
>> Best regards,
>
Powered by blists - more mailing lists