netdev - RE: [PATCH v1 net] udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230418015442.89242-1-kuniyu@amazon.com>
Date:   Mon, 17 Apr 2023 18:54:42 -0700
From:   Kuniyuki Iwashima <kuniyu@...zon.com>
To:     <willemdebruijn.kernel@...il.com>
CC:     <davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
        <kuni1840@...il.com>, <kuniyu@...zon.com>,
        <netdev@...r.kernel.org>, <pabeni@...hat.com>,
        <syzkaller@...glegroups.com>, <willemb@...gle.com>
Subject: RE: [PATCH v1 net] udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.

From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
Date:   Mon, 17 Apr 2023 21:38:23 -0400
> Kuniyuki Iwashima wrote:
[...]
> > > > So, we need to make sure TX tstamp is not queued if SOCK_DEAD is
> > > > flagged and we purge the queue only after marking SOCK_DEAD ?
> > > 
> > > Exactly. Thanks for the sketch.
> > > 
> > > Ideally without having to take an extra lock in the common path.
> > > sk_commmon_release calls sk_prot->destroy == udp_destroy_sock,
> > > which already sets SOCK_DEAD.
> > > 
> > > Could we move the skb_queue_purge in there? That is also what
> > > calls udp_flush_pending_frames.
> > 
> > Yes, that makes sense.
> > 
> > I was thinking if we need a memory barrier for SOCK_DEAD to sync
> > with TX, which reads it locklessly.  Maybe we should check SOCK_DEAD
> > with sk->sk_error_queue.lock held ?
> 
> the flag write needs the lock (which is held). The test_bit in
> sock_flag is atomic.

I was concerning this race:

					if (!sock_flag(sk, SOCK_DEAD)) {
	sock_flag(sk, SOCK_DEAD)
	skb_queue_purge()
						skb_queue_tail()
					}

and thought we can avoid it by checking SOCK_DEAD under sk_error_queue.lock.

					spin_lock_irqsave(sk_error_queue.lock
					if (!sock_flag(SOCK_DEAD)) {
	sock_flag(SOCK_DEAD)			__skb_queue_tail()
					}
					spin_unlock_irqrestore()
	skb_queue_purge()

What do you think ?

>  
> > And I forgot to return error from sock_queue_err_skb() to free skb
> > in __skb_complete_tx_timestamp().
> >
> > ---8<---
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 4c0879798eb8..287b834df9c8 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -4979,6 +4979,8 @@ static void skb_set_err_queue(struct sk_buff *skb)
> >   */
> >  int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
> >  {
> > +	unsigned long flags;
> > +
> >  	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
> >  	    (unsigned int)READ_ONCE(sk->sk_rcvbuf))
> >  		return -ENOMEM;
> > @@ -4992,9 +4994,16 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
> >  	/* before exiting rcu section, make sure dst is refcounted */
> >  	skb_dst_force(skb);
> >  
> > -	skb_queue_tail(&sk->sk_error_queue, skb);
> > -	if (!sock_flag(sk, SOCK_DEAD))
> > -		sk_error_report(sk);
> > +	spin_lock_irqsave(&sk->sk_error_queue.lock, flags);
> > +	if (sock_flag(sk, SOCK_DEAD)) {
> > +		spin_unlock_irqrestore(&sk->sk_error_queue.lock, flags);
> > +		return -EINVAL;
> > +	}
> > +	__skb_queue_tail(&sk->sk_error_queue, skb);
> > +	spin_unlock_irqrestore(&sk->sk_error_queue.lock, flags);
> > +
> > +	sk_error_report(sk);
> > +
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(sock_queue_err_skb);
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index c605d171eb2d..7060a5cda711 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -2674,6 +2674,11 @@ void udp_destroy_sock(struct sock *sk)
> >  		if (up->encap_enabled)
> >  			static_branch_dec(&udp_encap_needed_key);
> >  	}
> > +
> > +	/* A zerocopy skb has a refcnt of sk and may be
> > +	 * put into sk_error_queue with TX timestamp
> > +	 */
> > +	skb_queue_purge(&sk->sk_error_queue);
> >  }
> >  
> >  /*
> > ---8<---