lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CH3PR11MB73458BB403D537CFA96FD8DDFC769@CH3PR11MB7345.namprd11.prod.outlook.com>
Date: Tue, 9 May 2023 15:07:44 +0000
From: "Zhang, Cathy" <cathy.zhang@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: Paolo Abeni <pabeni@...hat.com>, "davem@...emloft.net"
	<davem@...emloft.net>, "kuba@...nel.org" <kuba@...nel.org>, "Brandeburg,
 Jesse" <jesse.brandeburg@...el.com>, "Srinivas, Suresh"
	<suresh.srinivas@...el.com>, "Chen, Tim C" <tim.c.chen@...el.com>, "You,
 Lizhen" <lizhen.you@...el.com>, "eric.dumazet@...il.com"
	<eric.dumazet@...il.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Shakeel Butt <shakeelb@...gle.com>
Subject: RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
 size



> -----Original Message-----
> From: Eric Dumazet <edumazet@...gle.com>
> Sent: Tuesday, May 9, 2023 7:59 PM
> To: Zhang, Cathy <cathy.zhang@...el.com>
> Cc: Paolo Abeni <pabeni@...hat.com>; davem@...emloft.net;
> kuba@...nel.org; Brandeburg, Jesse <jesse.brandeburg@...el.com>;
> Srinivas, Suresh <suresh.srinivas@...el.com>; Chen, Tim C
> <tim.c.chen@...el.com>; You, Lizhen <lizhen.you@...el.com>;
> eric.dumazet@...il.com; netdev@...r.kernel.org; Shakeel Butt
> <shakeelb@...gle.com>
> Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
> size
> 
> On Tue, May 9, 2023 at 1:01 PM Zhang, Cathy <cathy.zhang@...el.com>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhang, Cathy
> > > Sent: Tuesday, May 9, 2023 6:40 PM
> > > To: Paolo Abeni <pabeni@...hat.com>; edumazet@...gle.com;
> > > davem@...emloft.net; kuba@...nel.org
> > > Cc: Brandeburg, Jesse <jesse.brandeburg@...el.com>; Srinivas, Suresh
> > > <suresh.srinivas@...el.com>; Chen, Tim C <tim.c.chen@...el.com>;
> > > You, Lizhen <Lizhen.You@...el.com>; eric.dumazet@...il.com;
> > > netdev@...r.kernel.org
> > > Subject: RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as
> > > a proper size
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Paolo Abeni <pabeni@...hat.com>
> > > > Sent: Tuesday, May 9, 2023 5:51 PM
> > > > To: Zhang, Cathy <cathy.zhang@...el.com>; edumazet@...gle.com;
> > > > davem@...emloft.net; kuba@...nel.org
> > > > Cc: Brandeburg, Jesse <jesse.brandeburg@...el.com>; Srinivas,
> > > > Suresh <suresh.srinivas@...el.com>; Chen, Tim C
> > > > <tim.c.chen@...el.com>; You, Lizhen <lizhen.you@...el.com>;
> > > > eric.dumazet@...il.com; netdev@...r.kernel.org
> > > > Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc
> > > > as a proper size
> > > >
> > > > On Sun, 2023-05-07 at 19:08 -0700, Cathy Zhang wrote:
> > > > > Before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as
> > > > > small as possible"), each TCP can forward allocate up to 2 MB of
> > > > > memory and tcp_memory_allocated might hit tcp memory limitation
> quite soon.
> > > > > To reduce the memory pressure, that commit keeps
> > > > > sk->sk_forward_alloc as small as possible, which will be less
> > > > > sk->than 1
> > > > > page size if SO_RESERVE_MEM is not specified.
> > > > >
> > > > > However, with commit 4890b686f408 ("net: keep
> > > > > sk->sk_forward_alloc as small as possible"), memcg charge hot
> > > > > paths are observed while system is stressed with a large amount
> > > > > of connections. That is because
> > > > > sk->sk_forward_alloc is too small and it's always less than
> > > > > sk->truesize, network handlers like tcp_rcv_established() should
> > > > > sk->jump to
> > > > > slow path more frequently to increase sk->sk_forward_alloc. Each
> > > > > memory allocation will trigger memcg charge, then perf top shows
> > > > > the following contention paths on the busy system.
> > > > >
> > > > >     16.77%  [kernel]            [k] page_counter_try_charge
> > > > >     16.56%  [kernel]            [k] page_counter_cancel
> > > > >     15.65%  [kernel]            [k] try_charge_memcg
> > > >
> > > > I'm guessing you hit memcg limits frequently. I'm wondering if
> > > > it's just a matter of tuning/reducing tcp limits in
> /proc/sys/net/ipv4/tcp_mem.
> > >
> > > Hi Paolo,
> > >
> > > Do you mean hitting the limit of "--memory" which set when start
> container?
> > > If the memory option is not specified when init a container, cgroup2
> > > will create a memcg without memory limitation on the system, right?
> > > We've run test without this setting, and the memcg charge hot paths also
> exist.
> > >
> > > It seems that /proc/sys/net/ipv4/tcp_[wr]mem is not allowed to be
> > > changed by a simple echo writing, but requires a change to
> > > /etc/sys.conf, I'm not sure if it could be changed without stopping
> > > the running application.  Additionally, will this type of change
> > > bring more deeper and complex impact of network stack, compared to
> > > reclaim_threshold which is assumed to mostly affect of the memory
> > > allocation paths? Considering about this, it's decided to add the
> reclaim_threshold directly.
> > >
> >
> > BTW, there is a SK_RECLAIM_THRESHOLD in sk_mem_uncharge previously,
> we
> > add it back with a smaller but sensible setting.
> 
> The only sensible setting is as close as possible from 0 really.
> 
> Per-socket caches do not scale.
> Sure, they make some benchmarks really look nice.

Benchmark aims to help get better performance in reality I think :-)

> 
> Something must be wrong in your setup, because the only small issue that
> was noticed was the memcg one that Shakeel solved last year.

As mentioned in commit log, the test is to create 8 memcached-memtier pairs
on the same host, when server and client of the same pair connect to the same
CPU socket and share the same CPU set (28 CPUs), the memcg overhead is
obviously high as shown in commit log. If they are set with different CPU set from
separate CPU socket, the overhead is not so high but still observed.  Here is the
server/client command in our test:
server:
memcached -p ${port_i} -t ${threads_i} -c 10240
client:
memtier_benchmark --server=${memcached_id} --port=${port_i} \
--protocol=memcache_text --test-time=20 --threads=${threads_i} \
-c 1 --pipeline=16 --ratio=1:100 --run-count=5

So, is there anything wrong you see?

> 
> If under pressure, then memory allocations are going to be slow.
> Having per-socket caches is going to be unfair to sockets with empty caches.

Yeah, if the system is under pressure and even reaches to OOM, it should
release memory to make workload keep running. But if system is not with
memory pressure, better performance will be chased.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ