lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-Id: <20230508020801.10702-2-cathy.zhang@intel.com> Date: Sun, 7 May 2023 19:08:00 -0700 From: Cathy Zhang <cathy.zhang@...el.com> To: edumazet@...gle.com, davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com Cc: jesse.brandeburg@...el.com, suresh.srinivas@...el.com, tim.c.chen@...el.com, lizhen.you@...el.com, cathy.zhang@...el.com, eric.dumazet@...il.com, netdev@...r.kernel.org Subject: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size Before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible"), each TCP can forward allocate up to 2 MB of memory and tcp_memory_allocated might hit tcp memory limitation quite soon. To reduce the memory pressure, that commit keeps sk->sk_forward_alloc as small as possible, which will be less than 1 page size if SO_RESERVE_MEM is not specified. However, with commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible"), memcg charge hot paths are observed while system is stressed with a large amount of connections. That is because sk->sk_forward_alloc is too small and it's always less than sk->truesize, network handlers like tcp_rcv_established() should jump to slow path more frequently to increase sk->sk_forward_alloc. Each memory allocation will trigger memcg charge, then perf top shows the following contention paths on the busy system. 16.77% [kernel] [k] page_counter_try_charge 16.56% [kernel] [k] page_counter_cancel 15.65% [kernel] [k] try_charge_memcg In order to avoid the memcg overhead and performance penalty, sk->sk_forward_alloc should be kept with a proper size instead of as small as possible. Keep memory up to 64KB from reclaims when uncharging sk_buff memory, which is closer to the maximum size of sk_buff. It will help reduce the frequency of allocating memory during TCP connection. The original reclaim threshold for reserved memory per-socket is 2MB, so the extraneous memory reserved now is about 32 times less than before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible"). Run memcached with memtier_benchamrk to verify the optimization fix. 8 server-client pairs are created with bridge network on localhost, server and client of the same pair share 28 logical CPUs. Results (Average for 5 run) RPS (with/without patch) +2.07x Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible") Signed-off-by: Cathy Zhang <cathy.zhang@...el.com> Signed-off-by: Lizhen You <lizhen.you@...el.com> Tested-by: Long Tao <tao.long@...el.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@...el.com> Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com> Reviewed-by: Suresh Srinivas <suresh.srinivas@...el.com> --- include/net/sock.h | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/include/net/sock.h b/include/net/sock.h index 8b7ed7167243..6d2960479a80 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1657,12 +1657,33 @@ static inline void sk_mem_charge(struct sock *sk, int size) sk->sk_forward_alloc -= size; } +/* The following macro controls memory reclaiming in sk_mem_uncharge(). + */ +#define SK_RECLAIM_THRESHOLD (1 << 16) static inline void sk_mem_uncharge(struct sock *sk, int size) { + int reclaimable; + if (!sk_has_account(sk)) return; sk->sk_forward_alloc += size; - sk_mem_reclaim(sk); + + reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk); + + /* Reclaim memory to reduce memory pressure when multiple sockets + * run in parallel. However, if we reclaim all pages and keep + * sk->sk_forward_alloc as small as possible, it will cause + * paths like tcp_rcv_established() going to the slow path with + * much higher rate for forwarded memory expansion, which leads + * to contention hot points and performance drop. + * + * In order to avoid the above issue, it's necessary to keep + * sk->sk_forward_alloc with a proper size while doing reclaim. + */ + if (reclaimable > SK_RECLAIM_THRESHOLD) { + reclaimable -= SK_RECLAIM_THRESHOLD; + __sk_mem_reclaim(sk, reclaimable); + } } /* -- 2.34.1
Powered by blists - more mailing lists