lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 28 Apr 2010 11:41:12 -0400 From: Brian Bloniarz <bmb@...enacr.com> To: Eric Dumazet <eric.dumazet@...il.com> CC: David Miller <davem@...emloft.net>, therbert@...gle.com, netdev@...r.kernel.org, rick.jones2@...com Subject: Re: [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account Eric Dumazet wrote: > Le mardi 27 avril 2010 à 19:37 +0200, Eric Dumazet a écrit : > >> We might use the ticket spinlock paradigm to let writers go in parallel >> and let the user the socket lock >> >> Instead of having the bh_lock_sock() to protect receive_queue *and* >> backlog, writers get a unique slot in a table, that 'user' can handle >> later. >> >> Or serialize writers (before they try to bh_lock_sock()) with a >> dedicated lock, so that user has 50% chances to get the sock lock, >> contending with at most one writer. > > Following patch fixes the issue for me, with little performance hit on > fast path. > > Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp > receiver can now process ~200.000 pps (instead of ~100 pps before the > patch) on my dev machine. > > Thanks ! > > [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account > > Current socket backlog limit is not enough to really stop DDOS attacks, > because user thread spend many time to process a full backlog each > round, and user might crazy spin on socket lock. > > We should add backlog size and receive_queue size (aka rmem_alloc) to > pace writers, and let user run without being slow down too much. > > Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in > stress situations. > > Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp > receiver can now process ~200.000 pps (instead of ~100 pps before the > patch) on a 8 core machine. > > Signed-off-by: Eric Dumazet <eric.dumazet@...il.com> Wow that was awesome. > --- > include/net/sock.h | 13 +++++++++++-- > net/core/sock.c | 5 ++++- > net/ipv4/udp.c | 4 ++++ > net/ipv6/udp.c | 8 ++++++++ > 4 files changed, 27 insertions(+), 3 deletions(-) > > diff --git a/include/net/sock.h b/include/net/sock.h > index 86a8ca1..4b0097d 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -255,7 +255,6 @@ struct sock { > struct sk_buff *head; > struct sk_buff *tail; > int len; > - int limit; > } sk_backlog; > wait_queue_head_t *sk_sleep; > struct dst_entry *sk_dst_cache; > @@ -604,10 +603,20 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) > skb->next = NULL; > } > > +/* > + * Take into account size of receive queue and backlog queue > + */ > +static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb) > +{ > + unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc); > + > + return qsize + skb->truesize > sk->sk_rcvbuf; > +} > + Reading sk_backlog.len without the socket lock held seems to contradict the comment in sock.h: * @sk_backlog: always used with the per-socket spinlock held ... struct sock { ... /* * The backlog queue is special, it is always used with * the per-socket spinlock held and requires low latency * access. Therefore we special case it's implementation. */ struct { ... } sk_backlog; Is this just a doc mismatch or does sk_backlog.len need to use atomic_read & atomic_set? > /* The per-socket spinlock must be held here. */ > static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *skb) > { > - if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf << 1)) > + if (sk_rcvqueues_full(sk, skb)) > return -ENOBUFS; > > __sk_add_backlog(sk, skb); > diff --git a/net/core/sock.c b/net/core/sock.c > index 58ebd14..5104175 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -327,6 +327,10 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested) > > skb->dev = NULL; > > + if (sk_rcvqueues_full(sk, skb)) { > + atomic_inc(&sk->sk_drops); > + goto discard_and_relse; > + } > if (nested) > bh_lock_sock_nested(sk); > else > @@ -1885,7 +1889,6 @@ void sock_init_data(struct socket *sock, struct sock *sk) > sk->sk_allocation = GFP_KERNEL; > sk->sk_rcvbuf = sysctl_rmem_default; > sk->sk_sndbuf = sysctl_wmem_default; > - sk->sk_backlog.limit = sk->sk_rcvbuf << 1; > sk->sk_state = TCP_CLOSE; > sk_set_socket(sk, sock); > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 1e18f9c..776c844 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -1372,6 +1372,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) > goto drop; > } > > + > + if (sk_rcvqueues_full(sk, skb)) > + goto drop; > + > rc = 0; > > bh_lock_sock(sk); > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c > index 2850e35..3ead20a 100644 > --- a/net/ipv6/udp.c > +++ b/net/ipv6/udp.c > @@ -584,6 +584,10 @@ static void flush_stack(struct sock **stack, unsigned int count, > > sk = stack[i]; > if (skb1) { > + if (sk_rcvqueues_full(sk, skb)) { > + kfree_skb(skb1); > + goto drop; > + } > bh_lock_sock(sk); > if (!sock_owned_by_user(sk)) > udpv6_queue_rcv_skb(sk, skb1); > @@ -759,6 +763,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, > > /* deliver */ > > + if (sk_rcvqueues_full(sk, skb)) { > + sock_put(sk); > + goto discard; > + } > bh_lock_sock(sk); > if (!sock_owned_by_user(sk)) > udpv6_queue_rcv_skb(sk, skb); > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@...r.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists