netdev - Re: [PATCH v2 net 01/15] af_unix: Set sk->sk_state under unix_state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <17997c8f-bba1-4597-85c7-5d76de63a7a7@rbox.co>
Date: Wed, 19 Jun 2024 20:14:48 +0200
From: Michal Luczaj <mhal@...x.co>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: cong.wang@...edance.com, davem@...emloft.net, edumazet@...gle.com,
 kuba@...nel.org, kuni1840@...il.com, netdev@...r.kernel.org,
 pabeni@...hat.com
Subject: Re: [PATCH v2 net 01/15] af_unix: Set sk->sk_state under
 unix_state_lock() for truly disconencted peer.

On 6/17/24 20:21, Kuniyuki Iwashima wrote:
> From: Michal Luczaj <mhal@...x.co>
> Date: Mon, 17 Jun 2024 01:28:52 +0200
>> (...)
>> Another AF_UNIX sockmap issue is with OOB. When OOB packet is sent, skb is
>> added to recv queue, but also u->oob_skb is set. Here's the problem: when
>> this skb goes through bpf_sk_redirect_map() and is moved between socks,
>> oob_skb remains set on the original sock.
> 
> Good catch!
> 
>>
>> [   23.688994] WARNING: CPU: 2 PID: 993 at net/unix/garbage.c:351 unix_collect_queue+0x6c/0xb0
>> [   23.689019] CPU: 2 PID: 993 Comm: kworker/u32:13 Not tainted 6.10.0-rc2+ #137
>> [   23.689021] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>> [   23.689024] Workqueue: events_unbound __unix_gc
>> [   23.689027] RIP: 0010:unix_collect_queue+0x6c/0xb0
>>
>> I wanted to write a patch, but then I realized I'm not sure what's the
>> expected behaviour. Should the oob_skb setting follow to the skb's new sock
>> or should it be dropped (similarly to what is happening today with
>> scm_fp_list, i.e. redirect strips inflights)?
> 
> The former will require large refactoring as we need to check if the
> redirect happens for BPF_F_INGRESS and if the redirected sk is also
> SOCK_STREAM etc.
> 
> So, I'd go with the latter.  Probably we can check if skb is u->oob_skb
> and drop OOB data and retry next in unix_stream_read_skb(), and forbid
> MSG_OOB in unix_bpf_recvmsg().
> (...)

Yeah, sounds reasonable. I'm just not sure I understand the retry part. For
each skb_queue_tail() there's one ->sk_data_ready() (which does
->read_skb()). Why bother with a retry?

This is what I was thinking:

 static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 {
+	struct unix_sock *u = unix_sk(sk);
+	struct sk_buff *skb;
+	int err;
+
 	if (unlikely(READ_ONCE(sk->sk_state) != TCP_ESTABLISHED))
 		return -ENOTCONN;
 
-	return unix_read_skb(sk, recv_actor);
+	mutex_lock(&u->iolock);
+	skb = skb_recv_datagram(sk, MSG_DONTWAIT, &err);
+
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
+	if (skb) {
+		bool drop = false;
+
+		spin_lock(&sk->sk_receive_queue.lock);
+		if (skb == u->oob_skb) {
+			WRITE_ONCE(u->oob_skb, NULL);
+			drop = true;
+		}
+		spin_unlock(&sk->sk_receive_queue.lock);
+
+		if (drop) {
+			WARN_ON_ONCE(skb_unref(skb));
+			kfree_skb(skb);
+			skb = NULL;
+			err = 0;
+		}
+	}
+#endif
+
+	mutex_unlock(&u->iolock);
+	return skb ? recv_actor(sk, skb) : err;
 }