lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210621231307.1917413-1-kuba@kernel.org>
Date:   Mon, 21 Jun 2021 16:13:07 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     davem@...emloft.net
Cc:     netdev@...r.kernel.org, willemb@...gle.com, eric.dumazet@...il.com,
        dsahern@...il.com, yoshfuji@...ux-ipv6.org,
        Jakub Kicinski <kuba@...nel.org>, Dave Jones <dsj@...com>
Subject: [PATCH net-next] ip: avoid OOM kills with large UDP sends over loopback

Dave observed number of machines hitting OOM on the UDP send
path. The workload seems to be sending large UDP packets over
loopback. Since loopback has MTU of 64k kernel will try to
allocate an skb with up to 64k of head space. This has a good
chance of failing under memory pressure. What's worse if
the message length is <32k the allocation may trigger an
OOM killer.

This is entirely avoidable, we can use an skb with frags.

The scenario is unlikely and always using frags requires
an extra allocation so opt for using fallback, rather
then always using frag'ed/paged skb when payload is large.

Note that the size heuristic (header_len > PAGE_SIZE)
is not entirely accurate, __alloc_skb() will add ~400B
to size. Occasional order-1 allocation should be fine,
though, we are primarily concerned with order-3.

Reported-by: Dave Jones <dsj@...com>
Signed-off-by: Jakub Kicinski <kuba@...nel.org>
---
 include/net/sock.h    | 11 +++++++++++
 net/ipv4/ip_output.c  | 19 +++++++++++++++++--
 net/ipv6/ip6_output.c | 19 +++++++++++++++++--
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7a7058f4f265..4134fb718b97 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -924,6 +924,17 @@ static inline gfp_t sk_gfp_mask(const struct sock *sk, gfp_t gfp_mask)
 	return gfp_mask | (sk->sk_allocation & __GFP_MEMALLOC);
 }
 
+static inline void sk_allocation_push(struct sock *sk, gfp_t flag, gfp_t *old)
+{
+	*old = sk->sk_allocation;
+	sk->sk_allocation |= flag;
+}
+
+static inline void sk_allocation_pop(struct sock *sk, gfp_t old)
+{
+	sk->sk_allocation = old;
+}
+
 static inline void sk_acceptq_removed(struct sock *sk)
 {
 	WRITE_ONCE(sk->sk_ack_backlog, sk->sk_ack_backlog - 1);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c3efc7d658f6..a300c2c65d57 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1095,9 +1095,24 @@ static int __ip_append_data(struct sock *sk,
 				alloclen += rt->dst.trailer_len;
 
 			if (transhdrlen) {
-				skb = sock_alloc_send_skb(sk,
-						alloclen + hh_len + 15,
+				size_t header_len = alloclen + hh_len + 15;
+				gfp_t sk_allocation;
+
+				if (header_len > PAGE_SIZE)
+					sk_allocation_push(sk, __GFP_NORETRY,
+							   &sk_allocation);
+				skb = sock_alloc_send_skb(sk, header_len,
 						(flags & MSG_DONTWAIT), &err);
+				if (header_len > PAGE_SIZE) {
+					BUILD_BUG_ON(MAX_HEADER >= PAGE_SIZE);
+
+					sk_allocation_pop(sk, sk_allocation);
+					if (unlikely(!skb) && !paged &&
+					    rt->dst.dev->features & NETIF_F_SG) {
+						paged = true;
+						goto alloc_new_skb;
+					}
+				}
 			} else {
 				skb = NULL;
 				if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ff4f9ebcf7f6..9fd167db07e4 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1618,9 +1618,24 @@ static int __ip6_append_data(struct sock *sk,
 				goto error;
 			}
 			if (transhdrlen) {
-				skb = sock_alloc_send_skb(sk,
-						alloclen + hh_len,
+				size_t header_len = alloclen + hh_len;
+				gfp_t sk_allocation;
+
+				if (header_len > PAGE_SIZE)
+					sk_allocation_push(sk, __GFP_NORETRY,
+							   &sk_allocation);
+				skb = sock_alloc_send_skb(sk, header_len,
 						(flags & MSG_DONTWAIT), &err);
+				if (header_len > PAGE_SIZE) {
+					BUILD_BUG_ON(MAX_HEADER >= PAGE_SIZE);
+
+					sk_allocation_pop(sk, sk_allocation);
+					if (unlikely(!skb) && !paged &&
+					    rt->dst.dev->features & NETIF_F_SG) {
+						paged = true;
+						goto alloc_new_skb;
+					}
+				}
 			} else {
 				skb = NULL;
 				if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ