lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240408211830.99829-1-kuniyu@amazon.com>
Date: Mon, 8 Apr 2024 14:18:30 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <mhal@...x.co>
CC: <davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
	<kuniyu@...zon.com>, <netdev@...r.kernel.org>, <pabeni@...hat.com>
Subject: Re: [PATCH net 1/2] af_unix: Fix garbage collector racing against connect()

From: Michal Luczaj <mhal@...x.co>
Date: Mon,  8 Apr 2024 17:58:45 +0200
> Garbage collector does not take into account the risk of embryo getting
> enqueued during the garbage collection. If such embryo has a peer that
> carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
> different set of children. Leading to an incorrectly elevated inflight
> count, and then a dangling pointer within the gc_inflight_list.
> 
> sockets are AF_UNIX/SOCK_STREAM
> S is an unconnected socket
> L is a listening in-flight socket bound to addr, not in fdtable
> V's fd will be passed via sendmsg(), gets inflight count bumped
> 
> connect(S, addr)	sendmsg(S, [V]); close(V)	__unix_gc()
> ----------------	-------------------------	-----------
> 
> NS = unix_create1()
> skb1 = sock_wmalloc(NS)
> L = unix_find_other(addr)
> unix_state_lock(L)
> unix_peer(S) = NS
> 			// V count=1 inflight=0
> 
>  			NS = unix_peer(S)
>  			skb2 = sock_alloc()
> 			skb_queue_tail(NS, skb2[V])
> 
> 			// V became in-flight
> 			// V count=2 inflight=1
> 
> 			close(V)
> 
> 			// V count=1 inflight=1
> 			// GC candidate condition met
> 
> 						for u in gc_inflight_list:
> 						  if (total_refs == inflight_refs)
> 						    add u to gc_candidates
> 
> 						// gc_candidates={L, V}
> 
> 						for u in gc_candidates:
> 						  scan_children(u, dec_inflight)
> 
> 						// embryo (skb1) was not
> 						// reachable from L yet, so V's
> 						// inflight remains unchanged
> __skb_queue_tail(L, skb1)
> unix_state_unlock(L)
> 						for u in gc_candidates:
> 						  if (u.inflight)
> 						    scan_children(u, inc_inflight_move_tail)
> 
> 						// V count=1 inflight=2 (!)
> 
> If there is a GC-candidate listening socket, lock/unlock its state. This
> makes GC wait until the end of any ongoing connect() to that socket. After
> flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
> there is another connect() coming, its embryo won't carry SCM_RIGHTS as we
> already took the unix_gc_lock.
> 
> Fixes: 1fd05ba5a2f2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
> Signed-off-by: Michal Luczaj <mhal@...x.co>
> ---
>  net/unix/garbage.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> index fa39b6265238..cd3e8585ceb2 100644
> --- a/net/unix/garbage.c
> +++ b/net/unix/garbage.c
> @@ -274,11 +274,20 @@ static void __unix_gc(struct work_struct *work)
>  	 * receive queues.  Other, non candidate sockets _can_ be
>  	 * added to queue, so we must make sure only to touch
>  	 * candidates.
> +	 *
> +	 * Embryos, though never candidates themselves, affect which
> +	 * candidates are reachable by the garbage collector.  Before
> +	 * being added to a listener's queue, an embryo may already
> +	 * receive data carrying SCM_RIGHTS, potentially making the
> +	 * passed socket a candidate that is not yet reachable by the
> +	 * collector.  It becomes reachable once the embryo is
> +	 * enqueued.  Therefore, we must ensure that no SCM-laden
> +	 * embryo appears in a (candidate) listener's queue between
> +	 * consecutive scan_children() calls.
>  	 */
>  	list_for_each_entry_safe(u, next, &gc_inflight_list, link) {
> -		long total_refs;
> -
> -		total_refs = file_count(u->sk.sk_socket->file);
> +		struct sock *sk = &u->sk;
> +		long total_refs = file_count(sk->sk_socket->file);
>  
>  		WARN_ON_ONCE(!u->inflight);
>  		WARN_ON_ONCE(total_refs < u->inflight);
> @@ -286,6 +295,11 @@ static void __unix_gc(struct work_struct *work)
>  			list_move_tail(&u->link, &gc_candidates);
>  			__set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
>  			__set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
> +
> +			if (sk->sk_state == TCP_LISTEN) {
> +				unix_state_lock(sk);
> +				unix_state_unlock(sk);

Less likely though, what if the same connect() happens after this ?

connect(S, addr)	sendmsg(S, [V]); close(V)	__unix_gc()
----------------	-------------------------	-----------
NS = unix_create1()
skb1 = sock_wmalloc(NS)
L = unix_find_other(addr)
						for u in gc_inflight_list:
						  if (total_refs == inflight_refs)
						    add u to gc_candidates
						    // L was already traversed
						    // in a previous iteration.
unix_state_lock(L)
unix_peer(S) = NS

						// gc_candidates={L, V}

						for u in gc_candidates:
						  scan_children(u, dec_inflight)

						// embryo (skb1) was not
						// reachable from L yet, so V's
						// inflight remains unchanged
__skb_queue_tail(L, skb1)
unix_state_unlock(L)
						for u in gc_candidates:
						  if (u.inflight)
						    scan_children(u, inc_inflight_move_tail)

						// V count=1 inflight=2 (!)


As you pointed out, this GC's assumption is basically wrong; the GC
works correctly only when the set of traversed sockets does not change
over 3 scan_children() calls.

That's why I reworked the GC not to rely on receive queue.
https://lore.kernel.org/netdev/20240325202425.60930-1-kuniyu@amazon.com/


> +			}
>  		}
>  	}
>  
> -- 
> 2.44.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ