[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJvpgXTwGEiXAkFwY3j3RqVhNzJ_6_zmuRb4w7rUA_8Ug@mail.gmail.com>
Date: Tue, 9 May 2023 17:43:21 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: "Zhang, Cathy" <cathy.zhang@...el.com>
Cc: Paolo Abeni <pabeni@...hat.com>, "davem@...emloft.net" <davem@...emloft.net>,
"kuba@...nel.org" <kuba@...nel.org>, "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
"Srinivas, Suresh" <suresh.srinivas@...el.com>, "Chen, Tim C" <tim.c.chen@...el.com>,
"You, Lizhen" <lizhen.you@...el.com>, "eric.dumazet@...il.com" <eric.dumazet@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, Shakeel Butt <shakeelb@...gle.com>
Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size
On Tue, May 9, 2023 at 5:07 PM Zhang, Cathy <cathy.zhang@...el.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Eric Dumazet <edumazet@...gle.com>
> > Sent: Tuesday, May 9, 2023 7:59 PM
> > To: Zhang, Cathy <cathy.zhang@...el.com>
> > Cc: Paolo Abeni <pabeni@...hat.com>; davem@...emloft.net;
> > kuba@...nel.org; Brandeburg, Jesse <jesse.brandeburg@...el.com>;
> > Srinivas, Suresh <suresh.srinivas@...el.com>; Chen, Tim C
> > <tim.c.chen@...el.com>; You, Lizhen <lizhen.you@...el.com>;
> > eric.dumazet@...il.com; netdev@...r.kernel.org; Shakeel Butt
> > <shakeelb@...gle.com>
> > Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
> > size
> >
> > On Tue, May 9, 2023 at 1:01 PM Zhang, Cathy <cathy.zhang@...el.com>
> > wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Zhang, Cathy
> > > > Sent: Tuesday, May 9, 2023 6:40 PM
> > > > To: Paolo Abeni <pabeni@...hat.com>; edumazet@...gle.com;
> > > > davem@...emloft.net; kuba@...nel.org
> > > > Cc: Brandeburg, Jesse <jesse.brandeburg@...el.com>; Srinivas, Suresh
> > > > <suresh.srinivas@...el.com>; Chen, Tim C <tim.c.chen@...el.com>;
> > > > You, Lizhen <Lizhen.You@...el.com>; eric.dumazet@...il.com;
> > > > netdev@...r.kernel.org
> > > > Subject: RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as
> > > > a proper size
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Paolo Abeni <pabeni@...hat.com>
> > > > > Sent: Tuesday, May 9, 2023 5:51 PM
> > > > > To: Zhang, Cathy <cathy.zhang@...el.com>; edumazet@...gle.com;
> > > > > davem@...emloft.net; kuba@...nel.org
> > > > > Cc: Brandeburg, Jesse <jesse.brandeburg@...el.com>; Srinivas,
> > > > > Suresh <suresh.srinivas@...el.com>; Chen, Tim C
> > > > > <tim.c.chen@...el.com>; You, Lizhen <lizhen.you@...el.com>;
> > > > > eric.dumazet@...il.com; netdev@...r.kernel.org
> > > > > Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc
> > > > > as a proper size
> > > > >
> > > > > On Sun, 2023-05-07 at 19:08 -0700, Cathy Zhang wrote:
> > > > > > Before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as
> > > > > > small as possible"), each TCP can forward allocate up to 2 MB of
> > > > > > memory and tcp_memory_allocated might hit tcp memory limitation
> > quite soon.
> > > > > > To reduce the memory pressure, that commit keeps
> > > > > > sk->sk_forward_alloc as small as possible, which will be less
> > > > > > sk->than 1
> > > > > > page size if SO_RESERVE_MEM is not specified.
> > > > > >
> > > > > > However, with commit 4890b686f408 ("net: keep
> > > > > > sk->sk_forward_alloc as small as possible"), memcg charge hot
> > > > > > paths are observed while system is stressed with a large amount
> > > > > > of connections. That is because
> > > > > > sk->sk_forward_alloc is too small and it's always less than
> > > > > > sk->truesize, network handlers like tcp_rcv_established() should
> > > > > > sk->jump to
> > > > > > slow path more frequently to increase sk->sk_forward_alloc. Each
> > > > > > memory allocation will trigger memcg charge, then perf top shows
> > > > > > the following contention paths on the busy system.
> > > > > >
> > > > > > 16.77% [kernel] [k] page_counter_try_charge
> > > > > > 16.56% [kernel] [k] page_counter_cancel
> > > > > > 15.65% [kernel] [k] try_charge_memcg
> > > > >
> > > > > I'm guessing you hit memcg limits frequently. I'm wondering if
> > > > > it's just a matter of tuning/reducing tcp limits in
> > /proc/sys/net/ipv4/tcp_mem.
> > > >
> > > > Hi Paolo,
> > > >
> > > > Do you mean hitting the limit of "--memory" which set when start
> > container?
> > > > If the memory option is not specified when init a container, cgroup2
> > > > will create a memcg without memory limitation on the system, right?
> > > > We've run test without this setting, and the memcg charge hot paths also
> > exist.
> > > >
> > > > It seems that /proc/sys/net/ipv4/tcp_[wr]mem is not allowed to be
> > > > changed by a simple echo writing, but requires a change to
> > > > /etc/sys.conf, I'm not sure if it could be changed without stopping
> > > > the running application. Additionally, will this type of change
> > > > bring more deeper and complex impact of network stack, compared to
> > > > reclaim_threshold which is assumed to mostly affect of the memory
> > > > allocation paths? Considering about this, it's decided to add the
> > reclaim_threshold directly.
> > > >
> > >
> > > BTW, there is a SK_RECLAIM_THRESHOLD in sk_mem_uncharge previously,
> > we
> > > add it back with a smaller but sensible setting.
> >
> > The only sensible setting is as close as possible from 0 really.
> >
> > Per-socket caches do not scale.
> > Sure, they make some benchmarks really look nice.
>
> Benchmark aims to help get better performance in reality I think :-)
Sure, but system stability comes first.
>
> >
> > Something must be wrong in your setup, because the only small issue that
> > was noticed was the memcg one that Shakeel solved last year.
>
> As mentioned in commit log, the test is to create 8 memcached-memtier pairs
> on the same host, when server and client of the same pair connect to the same
> CPU socket and share the same CPU set (28 CPUs), the memcg overhead is
> obviously high as shown in commit log. If they are set with different CPU set from
> separate CPU socket, the overhead is not so high but still observed. Here is the
> server/client command in our test:
> server:
> memcached -p ${port_i} -t ${threads_i} -c 10240
> client:
> memtier_benchmark --server=${memcached_id} --port=${port_i} \
> --protocol=memcache_text --test-time=20 --threads=${threads_i} \
> -c 1 --pipeline=16 --ratio=1:100 --run-count=5
>
> So, is there anything wrong you see?
Please post /proc/sys/net/ipv4/tcp_[rw]mem setting, and "cat
/proc/net/sockstat" while the test is running.
Some mm experts should chime in, this is not a networking issue.
I suspect some kind of accidental false sharing.
Can you post this from your .config
grep RANDSTRUCT .config
Powered by blists - more mailing lists