lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CH3PR11MB73456D792EC6E7614E2EF14DFC769@CH3PR11MB7345.namprd11.prod.outlook.com>
Date: Tue, 9 May 2023 11:01:17 +0000
From: "Zhang, Cathy" <cathy.zhang@...el.com>
To: Paolo Abeni <pabeni@...hat.com>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "davem@...emloft.net" <davem@...emloft.net>,
	"kuba@...nel.org" <kuba@...nel.org>
CC: "Brandeburg, Jesse" <jesse.brandeburg@...el.com>, "Srinivas, Suresh"
	<suresh.srinivas@...el.com>, "Chen, Tim C" <tim.c.chen@...el.com>, "You,
 Lizhen" <lizhen.you@...el.com>, "eric.dumazet@...il.com"
	<eric.dumazet@...il.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
 size



> -----Original Message-----
> From: Zhang, Cathy
> Sent: Tuesday, May 9, 2023 6:40 PM
> To: Paolo Abeni <pabeni@...hat.com>; edumazet@...gle.com;
> davem@...emloft.net; kuba@...nel.org
> Cc: Brandeburg, Jesse <jesse.brandeburg@...el.com>; Srinivas, Suresh
> <suresh.srinivas@...el.com>; Chen, Tim C <tim.c.chen@...el.com>; You,
> Lizhen <Lizhen.You@...el.com>; eric.dumazet@...il.com;
> netdev@...r.kernel.org
> Subject: RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
> size
> 
> 
> 
> > -----Original Message-----
> > From: Paolo Abeni <pabeni@...hat.com>
> > Sent: Tuesday, May 9, 2023 5:51 PM
> > To: Zhang, Cathy <cathy.zhang@...el.com>; edumazet@...gle.com;
> > davem@...emloft.net; kuba@...nel.org
> > Cc: Brandeburg, Jesse <jesse.brandeburg@...el.com>; Srinivas, Suresh
> > <suresh.srinivas@...el.com>; Chen, Tim C <tim.c.chen@...el.com>; You,
> > Lizhen <lizhen.you@...el.com>; eric.dumazet@...il.com;
> > netdev@...r.kernel.org
> > Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a
> > proper size
> >
> > On Sun, 2023-05-07 at 19:08 -0700, Cathy Zhang wrote:
> > > Before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small
> > > as possible"), each TCP can forward allocate up to 2 MB of memory
> > > and tcp_memory_allocated might hit tcp memory limitation quite soon.
> > > To reduce the memory pressure, that commit keeps
> > > sk->sk_forward_alloc as small as possible, which will be less than 1
> > > page size if SO_RESERVE_MEM is not specified.
> > >
> > > However, with commit 4890b686f408 ("net: keep sk->sk_forward_alloc
> > > as small as possible"), memcg charge hot paths are observed while
> > > system is stressed with a large amount of connections. That is
> > > because
> > > sk->sk_forward_alloc is too small and it's always less than
> > > sk->truesize, network handlers like tcp_rcv_established() should
> > > sk->jump to
> > > slow path more frequently to increase sk->sk_forward_alloc. Each
> > > memory allocation will trigger memcg charge, then perf top shows the
> > > following contention paths on the busy system.
> > >
> > >     16.77%  [kernel]            [k] page_counter_try_charge
> > >     16.56%  [kernel]            [k] page_counter_cancel
> > >     15.65%  [kernel]            [k] try_charge_memcg
> >
> > I'm guessing you hit memcg limits frequently. I'm wondering if it's
> > just a matter of tuning/reducing tcp limits in /proc/sys/net/ipv4/tcp_mem.
> 
> Hi Paolo,
> 
> Do you mean hitting the limit of "--memory" which set when start container?
> If the memory option is not specified when init a container, cgroup2 will
> create a memcg without memory limitation on the system, right? We've run
> test without this setting, and the memcg charge hot paths also exist.
> 
> It seems that /proc/sys/net/ipv4/tcp_[wr]mem is not allowed to be changed
> by a simple echo writing, but requires a change to /etc/sys.conf, I'm not sure
> if it could be changed without stopping the running application.  Additionally,
> will this type of change bring more deeper and complex impact of network
> stack, compared to reclaim_threshold which is assumed to mostly affect of
> the memory allocation paths? Considering about this, it's decided to add the
> reclaim_threshold directly.
> 

BTW, there is a SK_RECLAIM_THRESHOLD in sk_mem_uncharge previously, we
add it back with a smaller but sensible setting.

> >
> > Cheers,
> >
> > Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ