netdev - Re: [PATCH net-next 1/2] net: Keep sk->sk_forward

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJBStfwej_SO3xZq32P0+jHcaPgxY9ZBk-y8p6ZbHeWZA@mail.gmail.com>
Date: Tue, 9 May 2023 10:48:57 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Cathy Zhang <cathy.zhang@...el.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, 
	jesse.brandeburg@...el.com, suresh.srinivas@...el.com, tim.c.chen@...el.com, 
	lizhen.you@...el.com, eric.dumazet@...il.com, netdev@...r.kernel.org, 
	Shakeel Butt <shakeelb@...gle.com>
Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size

On Mon, May 8, 2023 at 4:08 AM Cathy Zhang <cathy.zhang@...el.com> wrote:
>
> Before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as
> possible"), each TCP can forward allocate up to 2 MB of memory and
> tcp_memory_allocated might hit tcp memory limitation quite soon. To
> reduce the memory pressure, that commit keeps sk->sk_forward_alloc as
> small as possible, which will be less than 1 page size if SO_RESERVE_MEM
> is not specified.
>
> However, with commit 4890b686f408 ("net: keep sk->sk_forward_alloc as
> small as possible"), memcg charge hot paths are observed while system is
> stressed with a large amount of connections. That is because
> sk->sk_forward_alloc is too small and it's always less than
> sk->truesize, network handlers like tcp_rcv_established() should jump to
> slow path more frequently to increase sk->sk_forward_alloc. Each memory
> allocation will trigger memcg charge, then perf top shows the following
> contention paths on the busy system.
>
>     16.77%  [kernel]            [k] page_counter_try_charge
>     16.56%  [kernel]            [k] page_counter_cancel
>     15.65%  [kernel]            [k] try_charge_memcg
>
> In order to avoid the memcg overhead and performance penalty,
> sk->sk_forward_alloc should be kept with a proper size instead of as
> small as possible. Keep memory up to 64KB from reclaims when uncharging
> sk_buff memory, which is closer to the maximum size of sk_buff. It will
> help reduce the frequency of allocating memory during TCP connection.
> The original reclaim threshold for reserved memory per-socket is 2MB, so
> the extraneous memory reserved now is about 32 times less than before
> commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as
> possible").
>
> Run memcached with memtier_benchamrk to verify the optimization fix. 8
> server-client pairs are created with bridge network on localhost, server
> and client of the same pair share 28 logical CPUs.
>
> Results (Average for 5 run)
> RPS (with/without patch)        +2.07x
>
> Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible")
>
> Signed-off-by: Cathy Zhang <cathy.zhang@...el.com>
> Signed-off-by: Lizhen You <lizhen.you@...el.com>
> Tested-by: Long Tao <tao.long@...el.com>
> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@...el.com>
> Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com>
> Reviewed-by: Suresh Srinivas <suresh.srinivas@...el.com>
> ---


I disagree.

Memcg folks have solved this issue already.

CC Shakeel

64KB per socket is too much, some of us have 10 million tcp sockets per host.


>  include/net/sock.h | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 8b7ed7167243..6d2960479a80 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1657,12 +1657,33 @@ static inline void sk_mem_charge(struct sock *sk, int size)
>         sk->sk_forward_alloc -= size;
>  }
>
> +/* The following macro controls memory reclaiming in sk_mem_uncharge().
> + */
> +#define SK_RECLAIM_THRESHOLD   (1 << 16)
>  static inline void sk_mem_uncharge(struct sock *sk, int size)
>  {
> +       int reclaimable;
> +
>         if (!sk_has_account(sk))
>                 return;
>         sk->sk_forward_alloc += size;
> -       sk_mem_reclaim(sk);
> +
> +       reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk);
> +
> +       /* Reclaim memory to reduce memory pressure when multiple sockets
> +        * run in parallel. However, if we reclaim all pages and keep
> +        * sk->sk_forward_alloc as small as possible, it will cause
> +        * paths like tcp_rcv_established() going to the slow path with
> +        * much higher rate for forwarded memory expansion, which leads
> +        * to contention hot points and performance drop.
> +        *
> +        * In order to avoid the above issue, it's necessary to keep
> +        * sk->sk_forward_alloc with a proper size while doing reclaim.
> +        */
> +       if (reclaimable > SK_RECLAIM_THRESHOLD) {
> +               reclaimable -= SK_RECLAIM_THRESHOLD;
> +               __sk_mem_reclaim(sk, reclaimable);
> +       }
>  }
>
>  /*
> --
> 2.34.1
>