netdev - Re: [PATCH net-next 1/2] net: Keep sk->sk_forward

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJbAGnZd42SVZEYWFLYVbmHM3p2UDawUKxUBhVDH5A2=A@mail.gmail.com>
Date: Thu, 11 May 2023 09:50:30 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: "Zhang, Cathy" <cathy.zhang@...el.com>
Cc: Shakeel Butt <shakeelb@...gle.com>, Linux MM <linux-mm@...ck.org>, 
	Cgroups <cgroups@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>, 
	"davem@...emloft.net" <davem@...emloft.net>, "kuba@...nel.org" <kuba@...nel.org>, 
	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>, "Srinivas, Suresh" <suresh.srinivas@...el.com>, 
	"Chen, Tim C" <tim.c.chen@...el.com>, "You, Lizhen" <lizhen.you@...el.com>, 
	"eric.dumazet@...il.com" <eric.dumazet@...il.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size

On Thu, May 11, 2023 at 9:00 AM Zhang, Cathy <cathy.zhang@...el.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Zhang, Cathy
> > Sent: Thursday, May 11, 2023 8:53 AM
> > To: Shakeel Butt <shakeelb@...gle.com>
> > Cc: Eric Dumazet <edumazet@...gle.com>; Linux MM <linux-
> > mm@...ck.org>; Cgroups <cgroups@...r.kernel.org>; Paolo Abeni
> > <pabeni@...hat.com>; davem@...emloft.net; kuba@...nel.org;
> > Brandeburg, Jesse <jesse.brandeburg@...el.com>; Srinivas, Suresh
> > <suresh.srinivas@...el.com>; Chen, Tim C <tim.c.chen@...el.com>; You,
> > Lizhen <Lizhen.You@...el.com>; eric.dumazet@...il.com;
> > netdev@...r.kernel.org
> > Subject: RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
> > size
> >
> >
> >
> > > -----Original Message-----
> > > From: Shakeel Butt <shakeelb@...gle.com>
> > > Sent: Thursday, May 11, 2023 3:00 AM
> > > To: Zhang, Cathy <cathy.zhang@...el.com>
> > > Cc: Eric Dumazet <edumazet@...gle.com>; Linux MM <linux-
> > > mm@...ck.org>; Cgroups <cgroups@...r.kernel.org>; Paolo Abeni
> > > <pabeni@...hat.com>; davem@...emloft.net; kuba@...nel.org;
> > Brandeburg,
> > > Jesse <jesse.brandeburg@...el.com>; Srinivas, Suresh
> > > <suresh.srinivas@...el.com>; Chen, Tim C <tim.c.chen@...el.com>; You,
> > > Lizhen <lizhen.you@...el.com>; eric.dumazet@...il.com;
> > > netdev@...r.kernel.org
> > > Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a
> > > proper size
> > >
> > > On Wed, May 10, 2023 at 9:09 AM Zhang, Cathy <cathy.zhang@...el.com>
> > > wrote:
> > > >
> > > >
> > > [...]
> > > > > > >
> > > > > > > Have you tried to increase batch sizes ?
> > > > > >
> > > > > > I jus picked up 256 and 1024 for a try, but no help, the
> > > > > > overhead still
> > > exists.
> > > > >
> > > > > This makes no sense at all.
> > > >
> > > > Eric,
> > > >
> > > > I added a pr_info in try_charge_memcg() to print nr_pages if
> > > > nr_pages
> > > > >= MEMCG_CHARGE_BATCH, except it prints 64 during the initialization
> > > > of instances, there is no other output during the running. That
> > > > means nr_pages is not over 64, I guess that might be the reason why
> > > > to increase MEMCG_CHARGE_BATCH doesn't affect this case.
> > > >
> > >
> > > I am assuming you increased MEMCG_CHARGE_BATCH to 256 and 1024
> > but
> > > that did not help. To me that just means there is a different
> > > bottleneck in the memcg charging codepath. Can you please share the
> > > perf profile? Please note that memcg charging does a lot of other
> > > things as well like updating memcg stats and checking (and enforcing)
> > > memory.high even if you have not set memory.high.
> >
> > Thanks Shakeel! I will check more details on what you mentioned. We use
> > "sudo perf top -p $(docker inspect -f '{{.State.Pid}}' memcached_2)" to
> > monitor one of those instances, and also use "sudo perf top" to check the
> > overhead from system wide.
>
> Here is the annotate output of perf top for the three memcg hot paths:
>
> Showing cycles for page_counter_try_charge
>   Events  Pcnt (>=5%)
>  Percent |      Source code & Disassembly of elf for cycles (543288 samples, percent: local period)
> ---------------------------------------------------------------------------------------------------
>     0.00 :   ffffffff8141388d:       mov    %r12,%rax
>    76.82 :   ffffffff81413890:       lock xadd %rax,(%rbx)
>    22.10 :   ffffffff81413895:       lea    (%r12,%rax,1),%r15
>
>
> Showing cycles for page_counter_cancel
>   Events  Pcnt (>=5%)
>  Percent |      Source code & Disassembly of elf for cycles (1004744 samples, percent: local period)
> ----------------------------------------------------------------------------------------------------
>          : 160              return i + xadd(&v->counter, i);
>    77.42 :   ffffffff81413759:       lock xadd %rax,(%rdi)
>    22.34 :   ffffffff8141375e:       sub    %rsi,%rax
>
>
> Showing cycles for try_charge_memcg
>   Events  Pcnt (>=5%)
>  Percent |      Source code & Disassembly of elf for cycles (256531 samples, percent: local period)
> ---------------------------------------------------------------------------------------------------
>          : 22               return __READ_ONCE((v)->counter);
>    77.53 :   ffffffff8141df86:       mov    0x100(%r13),%rdx
>          : 2826             READ_ONCE(memcg->memory.high);
>    19.45 :   ffffffff8141df8d:       mov    0x190(%r13),%rcx

This is rephrasing the info you gave earlier ?

  16.77%  [kernel]            [k] page_counter_try_charge
    16.56%  [kernel]            [k] page_counter_cancel
    15.65%  [kernel]            [k] try_charge_memcg

What matters here is a call graph.

perf record -a -g sleep 5 # While the test is running
perf report --no-children --stdio

What precise kernel are you using btw ?