lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3362017f-9c3d-46cd-b3ce-cb750b565d5b@rbox.co>
Date: Fri, 30 Jan 2026 12:00:09 +0100
From: Michal Luczaj <mhal@...x.co>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: John Fastabend <john.fastabend@...il.com>,
 Jakub Sitnicki <jakub@...udflare.com>, Kuniyuki Iwashima
 <kuniyu@...gle.com>, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org,
 bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto
 update

On 1/29/26 20:41, Martin KaFai Lau wrote:
> On 1/29/26 8:47 AM, Michal Luczaj wrote:
>> BPF_MAP_UPDATE_ELEM races unix_stream_connect(): when
>> sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED),
>> unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL.
>>
>> 	T0 bpf				T1 connect
>> 	------				----------
>>
>> 				WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
>> sock_map_sk_state_allowed(sk)
>> ...
>> sk_pair = unix_peer(sk)
>> sock_hold(sk_pair)
>> 				sock_hold(newsk)
>> 				smp_mb__after_atomic()
>> 				unix_peer(sk) = newsk
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000080
>> RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
>> Call Trace:
>>   sock_map_link+0x564/0x8b0
>>   sock_map_update_common+0x6e/0x340
>>   sock_map_update_elem_sys+0x17d/0x240
>>   __sys_bpf+0x26db/0x3250
>>   __x64_sys_bpf+0x21/0x30
>>   do_syscall_64+0x6b/0x3a0
>>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> Follow-up to discussion at
>> https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/.
> 
> It is a long thread to dig. Please summarize the discussion in the 
> commit message.

OK, there we go:

The root cause of the null-ptr-deref is that unix_stream_connect() sets
sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns
a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes
sock_map_sk_state_allowed() believe that socket is properly set up, which
would include having a defined peer.

In other words, there's a window when you can call
unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL.

My initial idea was to simply move peer assignment _before_ the sk_state
update, but the maintainer wasn't interested in changing the
unix_stream_connect() hot path. He suggested taking care of it in the
sockmap code.

My understanding is that users are not supposed to put sockets in a sockmap
when said socket is only half-way through connect() call. Hence `return
-EINVAL` on a missing peer. Now, if users should be allowed to legally race
connect() vs. sockmap update, then I guess we can wait for connect() to
"finalize" e.g. by taking the unix_state_lock(), as discussed below.

>  From looking at this commit message, if the existing lock_sock held by 
> update_elem is not useful for af_unix,

Right, the existing lock_sock is not useful. update's lock_sock holds
sock::sk_lock, while unix_state_lock() holds unix_sock::lock.

> it is not clear why a new test 
> "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix. 

"On top"? Just to make sure we're looking at the same thing: above I was
trying to show two parallel flows with unix_peer() fetch in thread-0 and
WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1.

It fixes the problem because now update_proto won't call sock_hold(NULL).

> A minor thing is sock_map_sk_state_allowed doesn't have 
> READ_ONCE(sk->sk_state) for sk_is_stream_unix also.

Ok, I'll add this as a separate patch in v2. Along with the !tcp case of
sock_map_redirect_allowed()?

> If unix_stream_connect does not hold lock_sock, can unix_state_lock be 
> used here? lock_sock has already been taken, update_elem should not be 
> the hot path.

Yes, it can be used, it was proposed in the old thread. In fact, critical
section can be empty; only used to wait for unix_stream_connect() to
release the lock, which would guarantee unix_peer(sk) != NULL by then.

        if (!psock->sk_pair) {
+               unix_state_lock(sk);
+               unix_state_unlock(sk);
                sk_pair = unix_peer(sk);
                sock_hold(sk_pair);

>> Fixes: 8866730aed51 ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock")
>> Suggested-by: Kuniyuki Iwashima <kuniyu@...gle.com>
>> Signed-off-by: Michal Luczaj <mhal@...x.co>
>> ---
>> Re-triggered while working on an unrelated selftest:
>> https://lore.kernel.org/bpf/20260123-selftest-signal-on-connect-v1-0-b0256e7025b6@rbox.co/
>> ---
>>   net/unix/unix_bpf.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
>> index e0d30d6d22ac..57f3124c9d8d 100644
>> --- a/net/unix/unix_bpf.c
>> +++ b/net/unix/unix_bpf.c
>> @@ -185,6 +185,9 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r
>>   	 */
>>   	if (!psock->sk_pair) {
>>   		sk_pair = unix_peer(sk);
>> +		if (unlikely(!sk_pair))
>> +			return -EINVAL;
>> +
>>   		sock_hold(sk_pair);
>>   		psock->sk_pair = sk_pair;
>>   	}
>>
>> ---
>> base-commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
>> change-id: 20260129-unix-proto-update-null-ptr-deref-6a2733bcbbf8
>>
>> Best regards,
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ