netdev - Re: [PATCH net-next 1/2] net: Keep sk->sk_forward

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iL6Ckuu9vOEvc7A9CBLGuh-EpbwFRxRAchV-6VFyhTUpg@mail.gmail.com>
Date: Tue, 9 May 2023 13:58:59 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: "Zhang, Cathy" <cathy.zhang@...el.com>
Cc: Paolo Abeni <pabeni@...hat.com>, "davem@...emloft.net" <davem@...emloft.net>, 
	"kuba@...nel.org" <kuba@...nel.org>, "Brandeburg, Jesse" <jesse.brandeburg@...el.com>, 
	"Srinivas, Suresh" <suresh.srinivas@...el.com>, "Chen, Tim C" <tim.c.chen@...el.com>, 
	"You, Lizhen" <lizhen.you@...el.com>, "eric.dumazet@...il.com" <eric.dumazet@...il.com>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, Shakeel Butt <shakeelb@...gle.com>
Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size

On Tue, May 9, 2023 at 1:01 PM Zhang, Cathy <cathy.zhang@...el.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Zhang, Cathy
> > Sent: Tuesday, May 9, 2023 6:40 PM
> > To: Paolo Abeni <pabeni@...hat.com>; edumazet@...gle.com;
> > davem@...emloft.net; kuba@...nel.org
> > Cc: Brandeburg, Jesse <jesse.brandeburg@...el.com>; Srinivas, Suresh
> > <suresh.srinivas@...el.com>; Chen, Tim C <tim.c.chen@...el.com>; You,
> > Lizhen <Lizhen.You@...el.com>; eric.dumazet@...il.com;
> > netdev@...r.kernel.org
> > Subject: RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
> > size
> >
> >
> >
> > > -----Original Message-----
> > > From: Paolo Abeni <pabeni@...hat.com>
> > > Sent: Tuesday, May 9, 2023 5:51 PM
> > > To: Zhang, Cathy <cathy.zhang@...el.com>; edumazet@...gle.com;
> > > davem@...emloft.net; kuba@...nel.org
> > > Cc: Brandeburg, Jesse <jesse.brandeburg@...el.com>; Srinivas, Suresh
> > > <suresh.srinivas@...el.com>; Chen, Tim C <tim.c.chen@...el.com>; You,
> > > Lizhen <lizhen.you@...el.com>; eric.dumazet@...il.com;
> > > netdev@...r.kernel.org
> > > Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a
> > > proper size
> > >
> > > On Sun, 2023-05-07 at 19:08 -0700, Cathy Zhang wrote:
> > > > Before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small
> > > > as possible"), each TCP can forward allocate up to 2 MB of memory
> > > > and tcp_memory_allocated might hit tcp memory limitation quite soon.
> > > > To reduce the memory pressure, that commit keeps
> > > > sk->sk_forward_alloc as small as possible, which will be less than 1
> > > > page size if SO_RESERVE_MEM is not specified.
> > > >
> > > > However, with commit 4890b686f408 ("net: keep sk->sk_forward_alloc
> > > > as small as possible"), memcg charge hot paths are observed while
> > > > system is stressed with a large amount of connections. That is
> > > > because
> > > > sk->sk_forward_alloc is too small and it's always less than
> > > > sk->truesize, network handlers like tcp_rcv_established() should
> > > > sk->jump to
> > > > slow path more frequently to increase sk->sk_forward_alloc. Each
> > > > memory allocation will trigger memcg charge, then perf top shows the
> > > > following contention paths on the busy system.
> > > >
> > > >     16.77%  [kernel]            [k] page_counter_try_charge
> > > >     16.56%  [kernel]            [k] page_counter_cancel
> > > >     15.65%  [kernel]            [k] try_charge_memcg
> > >
> > > I'm guessing you hit memcg limits frequently. I'm wondering if it's
> > > just a matter of tuning/reducing tcp limits in /proc/sys/net/ipv4/tcp_mem.
> >
> > Hi Paolo,
> >
> > Do you mean hitting the limit of "--memory" which set when start container?
> > If the memory option is not specified when init a container, cgroup2 will
> > create a memcg without memory limitation on the system, right? We've run
> > test without this setting, and the memcg charge hot paths also exist.
> >
> > It seems that /proc/sys/net/ipv4/tcp_[wr]mem is not allowed to be changed
> > by a simple echo writing, but requires a change to /etc/sys.conf, I'm not sure
> > if it could be changed without stopping the running application.  Additionally,
> > will this type of change bring more deeper and complex impact of network
> > stack, compared to reclaim_threshold which is assumed to mostly affect of
> > the memory allocation paths? Considering about this, it's decided to add the
> > reclaim_threshold directly.
> >
>
> BTW, there is a SK_RECLAIM_THRESHOLD in sk_mem_uncharge previously, we
> add it back with a smaller but sensible setting.

The only sensible setting is as close as possible from 0 really.

Per-socket caches do not scale.
Sure, they make some benchmarks really look nice.

Something must be wrong in your setup, because the only small issue
that was noticed was the memcg
one that Shakeel solved last year.

If under pressure, then memory allocations are going to be slow.
Having per-socket caches is going to be unfair to sockets with empty caches.