[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CH3PR11MB7345F029F83269D6B1A66231FC769@CH3PR11MB7345.namprd11.prod.outlook.com>
Date: Tue, 9 May 2023 09:33:52 +0000
From: "Zhang, Cathy" <cathy.zhang@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: "davem@...emloft.net" <davem@...emloft.net>, "kuba@...nel.org"
<kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>, "Brandeburg,
Jesse" <jesse.brandeburg@...el.com>, "Srinivas, Suresh"
<suresh.srinivas@...el.com>, "Chen, Tim C" <tim.c.chen@...el.com>, "You,
Lizhen" <lizhen.you@...el.com>, "eric.dumazet@...il.com"
<eric.dumazet@...il.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Shakeel Butt <shakeelb@...gle.com>
Subject: RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
size
> -----Original Message-----
> From: Eric Dumazet <edumazet@...gle.com>
> Sent: Tuesday, May 9, 2023 4:49 PM
> To: Zhang, Cathy <cathy.zhang@...el.com>
> Cc: davem@...emloft.net; kuba@...nel.org; pabeni@...hat.com;
> Brandeburg, Jesse <jesse.brandeburg@...el.com>; Srinivas, Suresh
> <suresh.srinivas@...el.com>; Chen, Tim C <tim.c.chen@...el.com>; You,
> Lizhen <lizhen.you@...el.com>; eric.dumazet@...il.com;
> netdev@...r.kernel.org; Shakeel Butt <shakeelb@...gle.com>
> Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
> size
>
> On Mon, May 8, 2023 at 4:08 AM Cathy Zhang <cathy.zhang@...el.com>
> wrote:
> >
> > Before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small
> > as possible"), each TCP can forward allocate up to 2 MB of memory and
> > tcp_memory_allocated might hit tcp memory limitation quite soon. To
> > reduce the memory pressure, that commit keeps sk->sk_forward_alloc as
> > small as possible, which will be less than 1 page size if
> > SO_RESERVE_MEM is not specified.
> >
> > However, with commit 4890b686f408 ("net: keep sk->sk_forward_alloc as
> > small as possible"), memcg charge hot paths are observed while system
> > is stressed with a large amount of connections. That is because
> > sk->sk_forward_alloc is too small and it's always less than truesize,
> > sk->network handlers like tcp_rcv_established() should jump to
> > slow path more frequently to increase sk->sk_forward_alloc. Each
> > memory allocation will trigger memcg charge, then perf top shows the
> > following contention paths on the busy system.
> >
> > 16.77% [kernel] [k] page_counter_try_charge
> > 16.56% [kernel] [k] page_counter_cancel
> > 15.65% [kernel] [k] try_charge_memcg
> >
> > In order to avoid the memcg overhead and performance penalty,
> > sk->sk_forward_alloc should be kept with a proper size instead of as
> > small as possible. Keep memory up to 64KB from reclaims when
> > uncharging sk_buff memory, which is closer to the maximum size of
> > sk_buff. It will help reduce the frequency of allocating memory during TCP
> connection.
> > The original reclaim threshold for reserved memory per-socket is 2MB,
> > so the extraneous memory reserved now is about 32 times less than
> > before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small
> > as possible").
> >
> > Run memcached with memtier_benchamrk to verify the optimization fix. 8
> > server-client pairs are created with bridge network on localhost,
> > server and client of the same pair share 28 logical CPUs.
> >
> > Results (Average for 5 run)
> > RPS (with/without patch) +2.07x
> >
> > Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as
> > possible")
> >
> > Signed-off-by: Cathy Zhang <cathy.zhang@...el.com>
> > Signed-off-by: Lizhen You <lizhen.you@...el.com>
> > Tested-by: Long Tao <tao.long@...el.com>
> > Reviewed-by: Jesse Brandeburg <jesse.brandeburg@...el.com>
> > Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com>
> > Reviewed-by: Suresh Srinivas <suresh.srinivas@...el.com>
> > ---
>
>
> I disagree.
>
> Memcg folks have solved this issue already.
Hi Eric,
Do you mean the series - "memcg: optimize charge codepath" by Shakeel?
https://lore.kernel.org/lkml/20220825000506.239406-1-shakeelb@google.com/T/
Our running is gains v6.3 which includes this series, we do not see the hot path
like " propagate_protected_usage", but see the hot paths shown in the commit
log.
>
> CC Shakeel
>
> 64KB per socket is too much, some of us have 10 million tcp sockets per host.
64KB is selected because it's the maximum packet size of IPV4, please correct me
if I'm wrong. If the reserved memory size is too high, it might bring memory pressure,
but if it's too low, memcg overhead will happen in scenarios like ours. That's why we
provide a tunable ABI in next patch, it's allowed Admin or someone else to tune it
according to the system running status. Regarding the default value, another try is to
set it as the default value of /proc/sys/net/ipv4/tcp_wmem which is 16KB.
>
>
> > include/net/sock.h | 23 ++++++++++++++++++++++-
> > 1 file changed, 22 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h index
> > 8b7ed7167243..6d2960479a80 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1657,12 +1657,33 @@ static inline void sk_mem_charge(struct sock
> *sk, int size)
> > sk->sk_forward_alloc -= size;
> > }
> >
> > +/* The following macro controls memory reclaiming in
> sk_mem_uncharge().
> > + */
> > +#define SK_RECLAIM_THRESHOLD (1 << 16)
> > static inline void sk_mem_uncharge(struct sock *sk, int size) {
> > + int reclaimable;
> > +
> > if (!sk_has_account(sk))
> > return;
> > sk->sk_forward_alloc += size;
> > - sk_mem_reclaim(sk);
> > +
> > + reclaimable = sk->sk_forward_alloc -
> > + sk_unused_reserved_mem(sk);
> > +
> > + /* Reclaim memory to reduce memory pressure when multiple
> sockets
> > + * run in parallel. However, if we reclaim all pages and keep
> > + * sk->sk_forward_alloc as small as possible, it will cause
> > + * paths like tcp_rcv_established() going to the slow path with
> > + * much higher rate for forwarded memory expansion, which leads
> > + * to contention hot points and performance drop.
> > + *
> > + * In order to avoid the above issue, it's necessary to keep
> > + * sk->sk_forward_alloc with a proper size while doing reclaim.
> > + */
> > + if (reclaimable > SK_RECLAIM_THRESHOLD) {
> > + reclaimable -= SK_RECLAIM_THRESHOLD;
> > + __sk_mem_reclaim(sk, reclaimable);
> > + }
> > }
> >
> > /*
> > --
> > 2.34.1
> >
Powered by blists - more mailing lists