[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co>
Date: Sun, 9 Jun 2024 13:28:34 +0200
From: Michal Luczaj <mhal@...x.co>
To: Kuniyuki Iwashima <kuniyu@...zon.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org,
Cong Wang <cong.wang@...edance.com>
Subject: Re: [PATCH v2 net 01/15] af_unix: Set sk->sk_state under
unix_state_lock() for truly disconencted peer.
On 6/4/24 18:52, Kuniyuki Iwashima wrote:
> When a SOCK_DGRAM socket connect()s to another socket, the both sockets'
> sk->sk_state are changed to TCP_ESTABLISHED so that we can register them
> to BPF SOCKMAP. (...)
Speaking of af_unix and sockmap, SOCK_STREAM has a tiny window for
bpf(BPF_MAP_UPDATE_ELEM) and unix_stream_connect() to race: when
sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED), but
unix_peer(sk) in unix_stream_bpf_update_proto() _still_ returns NULL:
T0 bpf T1 connect
====== ==========
WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
sock_map_sk_state_allowed(sk)
...
sk_pair = unix_peer(sk)
sock_hold(sk_pair)
sock_hold(newsk)
smp_mb__after_atomic()
unix_peer(sk) = newsk
unix_state_unlock(sk)
With mdelay(1) stuffed in unix_stream_connect():
[ 902.277593] BUG: kernel NULL pointer dereference, address: 0000000000000080
[ 902.277633] #PF: supervisor write access in kernel mode
[ 902.277661] #PF: error_code(0x0002) - not-present page
[ 902.277688] PGD 107191067 P4D 107191067 PUD 10f63c067 PMD 0
[ 902.277716] Oops: Oops: 0002 [#23] PREEMPT SMP NOPTI
[ 902.277742] CPU: 2 PID: 1505 Comm: a.out Tainted: G D 6.10.0-rc1+ #130
[ 902.277769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 902.277793] RIP: 0010:unix_stream_bpf_update_proto+0xa1/0x150
Setting TCP_ESTABLISHED _after_ unix_peer() fixes the issue, so how about
something like
@@ -1631,12 +1631,13 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
/* Set credentials */
copy_peercred(sk, other);
- sock->state = SS_CONNECTED;
- WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED);
sock_hold(newsk);
+ smp_mb__after_atomic(); /* sock_hold() does an atomic_inc() */
+ WRITE_ONCE(unix_peer(sk), newsk);
+ smp_wmb(); /* ensure peer is set before sk_state */
- smp_mb__after_atomic(); /* sock_hold() does an atomic_inc() */
- unix_peer(sk) = newsk;
+ sock->state = SS_CONNECTED;
+ WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED);
unix_state_unlock(sk);
@@ -180,7 +180,8 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r
* be a single matching destroy operation.
*/
if (!psock->sk_pair) {
- sk_pair = unix_peer(sk);
+ smp_rmb();
+ sk_pair = READ_ONCE(unix_peer(sk));
sock_hold(sk_pair);
psock->sk_pair = sk_pair;
}
This should keep things ordered and lockless... I hope.
Alternatively, maybe it would be better just to make BPF respect the unix
state lock?
@@ -180,6 +180,8 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r
* be a single matching destroy operation.
*/
if (!psock->sk_pair) {
+ unix_state_lock(sk);
sk_pair = unix_peer(sk);
+ unix_state_unlock(sk);
sock_hold(sk_pair);
psock->sk_pair = sk_pair;
What do you think?
Thanks,
Michal
Powered by blists - more mailing lists