netdev - Re: [PATCH 1/8] net: add limit for socket backlog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 03 Mar 2010 07:54:29 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Zhu Yi <yi.zhu@...el.com>
Cc:	netdev@...r.kernel.org, David Miller <davem@...emloft.net>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	"Pekka Savola (ipv6)" <pekkas@...core.fi>,
	Patrick McHardy <kaber@...sh.net>,
	Vlad Yasevich <vladislav.yasevich@...com>,
	Sridhar Samudrala <sri@...ibm.com>,
	Per Liden <per.liden@...csson.com>,
	Jon Maloy <jon.maloy@...csson.com>,
	Allan Stephens <allan.stephens@...driver.com>,
	Andrew Hendry <andrew.hendry@...il.com>
Subject: Re: [PATCH 1/8] net: add limit for socket backlog

Le mercredi 03 mars 2010 à 14:35 +0800, Zhu Yi a écrit :
> We got system OOM while running some UDP netperf testing on the loopback
> device. The case is multiple senders sent stream UDP packets to a single
> receiver via loopback on local host. Of course, the receiver is not able
> to handle all the packets in time. But we surprisingly found that these
> packets were not discarded due to the receiver's sk->sk_rcvbuf limit.
> Instead, they are kept queuing to sk->sk_backlog and finally ate up all
> the memory. We believe this is a secure hole that a none privileged user
> can crash the system.
> 
> The root cause for this problem is, when the receiver is doing
> __release_sock() (i.e. after userspace recv, kernel udp_recvmsg ->
> skb_free_datagram_locked -> release_sock), it moves skbs from backlog to
> sk_receive_queue with the softirq enabled. In the above case, multiple
> busy senders will almost make it an endless loop. The skbs in the
> backlog end up eat all the system memory.
> 
> The issue is not only for UDP. Any protocols using socket backlog is
> potentially affected. The patch adds limit for socket backlog so that
> the backlog size cannot be expanded endlessly.
> 
> Reported-by: Alex Shi <alex.shi@...el.com>
> Cc: David Miller <davem@...emloft.net>
> Cc: Arnaldo Carvalho de Melo <acme@...stprotocols.net>
> Cc: Alexey Kuznetsov <kuznet@....inr.ac.ru
> Cc: "Pekka Savola (ipv6)" <pekkas@...core.fi>
> Cc: Patrick McHardy <kaber@...sh.net>
> Cc: Vlad Yasevich <vladislav.yasevich@...com>
> Cc: Sridhar Samudrala <sri@...ibm.com>
> Cc: Per Liden <per.liden@...csson.com>
> Cc: Jon Maloy <jon.maloy@...csson.com>
> Cc: Allan Stephens <allan.stephens@...driver.com>
> Cc: Andrew Hendry <andrew.hendry@...il.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
> Signed-off-by: Zhu Yi <yi.zhu@...el.com>

Your SOB should be before mine, you are the main author of this patch, I
am a contributor

> ---
>  include/net/sock.h |   17 +++++++++++++++--
>  net/core/sock.c    |   15 +++++++++++++--
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 6cb1676..847119a 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -253,6 +253,8 @@ struct sock {
>  	struct {
>  		struct sk_buff *head;
>  		struct sk_buff *tail;
> +		int len;
> +		int limit;

This new limit field is really not needed

>  	} sk_backlog;
>  	wait_queue_head_t	*sk_sleep;
>  	struct dst_entry	*sk_dst_cache;
> @@ -589,8 +591,8 @@ static inline int sk_stream_memory_free(struct sock *sk)
>  	return sk->sk_wmem_queued < sk->sk_sndbuf;
>  }
>  
> -/* The per-socket spinlock must be held here. */
> -static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb)
> +/* OOB backlog add */
> +static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
>  {
>  	if (!sk->sk_backlog.tail) {
>  		sk->sk_backlog.head = sk->sk_backlog.tail = skb;
> @@ -601,6 +603,17 @@ static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb)
>  	skb->next = NULL;
>  }
>  
> +/* The per-socket spinlock must be held here. */
> +static inline int sk_add_backlog(struct sock *sk, struct sk_buff *skb)
> +{
> +	if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf >> 1))
> +		return -ENOBUFS;
> +
> +	__sk_add_backlog(sk, skb);
> +	sk->sk_backlog.len += skb->truesize;
> +	return 0;
> +}
> +

Ouch, this patch breaks bisection, since all protocols currently ignore
-ENOBUFS value and dont free skb

If you split your work on several patches, you still have to make
resulting kernels usable.

>  static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
>  {
>  	return sk->sk_backlog_rcv(sk, skb);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index fcd397a..fa042bc 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -340,8 +340,12 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
>  		rc = sk_backlog_rcv(sk, skb);
>  
>  		mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
> -	} else
> -		sk_add_backlog(sk, skb);
> +	} else if (sk_add_backlog(sk, skb)) {
> +		bh_unlock_sock(sk);
> +		atomic_inc(&sk->sk_drops);
> +		goto discard_and_relse;
> +	}
> +
>  	bh_unlock_sock(sk);
>  out:
>  	sock_put(sk);
> @@ -1139,6 +1142,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
>  		sock_lock_init(newsk);
>  		bh_lock_sock(newsk);
>  		newsk->sk_backlog.head	= newsk->sk_backlog.tail = NULL;
> +		newsk->sk_backlog.len = 0;
>  
>  		atomic_set(&newsk->sk_rmem_alloc, 0);
>  		/*
> @@ -1542,6 +1546,12 @@ static void __release_sock(struct sock *sk)
>  
>  		bh_lock_sock(sk);
>  	} while ((skb = sk->sk_backlog.head) != NULL);
> +
> +	/*
> +	 * Doing the zeroing here guarantee we can not loop forever
> +	 * while a wild producer attempts to flood us.
> +	 */
> +	sk->sk_backlog.len = 0;
>  }
>  
>  /**
> @@ -1874,6 +1884,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
>  	sk->sk_allocation	=	GFP_KERNEL;
>  	sk->sk_rcvbuf		=	sysctl_rmem_default;
>  	sk->sk_sndbuf		=	sysctl_wmem_default;
> +	sk->sk_backlog.limit	=	sk->sk_rcvbuf >> 1;

Didnt we agreed to use sk_>rcvbuf << 1  in previous round ?

>  	sk->sk_state		=	TCP_CLOSE;
>  	sk_set_socket(sk, sock);
>  



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html