lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 12 Oct 2020 00:00:24 +0800
From:   Muchun Song <songmuchun@...edance.com>
To:     Mike Rapoport <rppt@...nel.org>
Cc:     Greg KH <gregkh@...uxfoundation.org>, rafael@...nel.org,
        mst@...hat.com, jasowang@...hat.com,
        David Miller <davem@...emloft.net>, kuba@...nel.org,
        adobriyan@...il.com, Andrew Morton <akpm@...ux-foundation.org>,
        Eric Dumazet <edumazet@...gle.com>, kuznet@....inr.ac.ru,
        yoshfuji@...ux-ipv6.org, steffen.klassert@...unet.com,
        herbert@...dor.apana.org.au, Shakeel Butt <shakeelb@...gle.com>,
        Will Deacon <will@...nel.org>, Michal Hocko <mhocko@...e.com>,
        Roman Gushchin <guro@...com>, neilb@...e.de,
        Sami Tolvanen <samitolvanen@...gle.com>,
        kirill.shutemov@...ux.intel.com, feng.tang@...el.com,
        pabeni@...hat.com, Willem de Bruijn <willemb@...gle.com>,
        Randy Dunlap <rdunlap@...radead.org>, fw@...len.de,
        gustavoars@...nel.org, pablo@...filter.org, decui@...rosoft.com,
        jakub@...udflare.com, Peter Zijlstra <peterz@...radead.org>,
        christian.brauner@...ntu.com, ebiederm@...ssion.com,
        Thomas Gleixner <tglx@...utronix.de>, dave@...olabs.net,
        Michel Lespinasse <walken@...gle.com>,
        Jann Horn <jannh@...gle.com>, chenqiwu@...omi.com,
        christophe.leroy@....fr, minchan@...nel.org,
        Martin KaFai Lau <kafai@...com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>, linmiaohe@...wei.com,
        Kees Cook <keescook@...omium.org>,
        LKML <linux-kernel@...r.kernel.org>,
        virtualization@...ts.linux-foundation.org,
        Networking <netdev@...r.kernel.org>,
        linux-fsdevel@...r.kernel.org,
        Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [External] Re: [PATCH] mm: proc: add Sock to /proc/meminfo

On Sun, Oct 11, 2020 at 9:53 PM Mike Rapoport <rppt@...nel.org> wrote:
>
> On Sat, Oct 10, 2020 at 06:38:54PM +0800, Muchun Song wrote:
> > The amount of memory allocated to sockets buffer can become significant.
> > However, we do not display the amount of memory consumed by sockets
> > buffer. In this case, knowing where the memory is consumed by the kernel
> > is very difficult. On our server with 500GB RAM, sometimes we can see
> > 25GB disappear through /proc/meminfo. After our analysis, we found the
> > following memory allocation path which consumes the memory with page_owner
> > enabled.
>
> I have a high lelel question.
> There is accounting of the socket memory for memcg that gets called from
> the networking layer. Did you check if the same call sites can be used
> for the system-wide accounting as well?

I also think about this. But we did not pass the `struct page` parameter to
the sock accounting memcg API. So we did not know the NUMA node
which allocated the socket buffer memory and cannot do node-level
statistics. In addition, there is another problem. If the user sends a 4096-byte
message, we only charge one page to the memcg but the system allocates 8
pages. So if we reuse the same call sites for the system-wide accounting,
the statistical count we get is always smaller than the actual situation.

>
> >   849698 times:
> >   Page allocated via order 3, mask 0x4052c0(GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP)
> >    __alloc_pages_nodemask+0x11d/0x290
> >    skb_page_frag_refill+0x68/0xf0
> >    sk_page_frag_refill+0x19/0x70
> >    tcp_sendmsg_locked+0x2f4/0xd10
> >    tcp_sendmsg+0x29/0xa0
> >    sock_sendmsg+0x30/0x40
> >    sock_write_iter+0x8f/0x100
> >    __vfs_write+0x10b/0x190
> >    vfs_write+0xb0/0x190
> >    ksys_write+0x5a/0xd0
> >    do_syscall_64+0x5d/0x110
> >    entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > Signed-off-by: Muchun Song <songmuchun@...edance.com>
> > ---
> >  drivers/base/node.c      |  2 ++
> >  drivers/net/virtio_net.c |  3 +--
>
> Is virtio-net the only dirver that requred an update?

Yeah, only virtio-net needs an update. Because only it uses the
skb_page_frag_refill() API.

>
> >  fs/proc/meminfo.c        |  1 +
> >  include/linux/mmzone.h   |  1 +
> >  include/linux/skbuff.h   | 43 ++++++++++++++++++++++++++++++++++++++--
> >  kernel/exit.c            |  3 +--
> >  mm/page_alloc.c          |  7 +++++--
> >  mm/vmstat.c              |  1 +
> >  net/core/sock.c          |  8 ++++----
> >  net/ipv4/tcp.c           |  3 +--
> >  net/xfrm/xfrm_state.c    |  3 +--
> >  11 files changed, 59 insertions(+), 16 deletions(-)
> >



-- 
Yours,
Muchun

Powered by blists - more mailing lists