linux-kernel - Re: [PATCH v2] mm, memcg: Add a memcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALvZod7pdOx0a1v4oX5-7ZfCykM8iwRwPkW-+gbO1B4+j1SXqw@mail.gmail.com>
Date:   Wed, 19 Jun 2019 16:48:09 -0700
From:   Shakeel Butt <shakeelb@...gle.com>
To:     Waiman Long <longman@...hat.com>
Cc:     Christoph Lameter <cl@...ux.com>,
        Pekka Enberg <penberg@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Michal Hocko <mhocko@...nel.org>, Roman Gushchin <guro@...com>,
        Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>
Subject: Re: [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file

Hi Waiman,

On Wed, Jun 19, 2019 at 10:16 AM Waiman Long <longman@...hat.com> wrote:
>
> There are concerns about memory leaks from extensive use of memory
> cgroups as each memory cgroup creates its own set of kmem caches. There
> is a possiblity that the memcg kmem caches may remain even after the
> memory cgroups have been offlined. Therefore, it will be useful to show
> the status of each of memcg kmem caches.
>
> This patch introduces a new <debugfs>/memcg_slabinfo file which is
> somewhat similar to /proc/slabinfo in format, but lists only information
> about kmem caches that have child memcg kmem caches. Information
> available in /proc/slabinfo are not repeated in memcg_slabinfo.
>
> A portion of a sample output of the file was:
>
>   # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs>
>   rpc_inode_cache   root          13     51      1      1
>   rpc_inode_cache     48           0      0      0      0
>   fat_inode_cache   root           1     45      1      1
>   fat_inode_cache     41           2     45      1      1
>   xfs_inode         root         770    816     24     24
>   xfs_inode           92          22     34      1      1
>   xfs_inode           88:dead      1     34      1      1
>   xfs_inode           89:dead     23     34      1      1
>   xfs_inode           85           4     34      1      1
>   xfs_inode           84           9     34      1      1
>
> The css id of the memcg is also listed. If a memcg is not online,
> the tag ":dead" will be attached as shown above.
>
> Suggested-by: Shakeel Butt <shakeelb@...gle.com>
> Signed-off-by: Waiman Long <longman@...hat.com>
> ---
>  mm/slab_common.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 57 insertions(+)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 58251ba63e4a..2bca1558a722 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -17,6 +17,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/seq_file.h>
>  #include <linux/proc_fs.h>
> +#include <linux/debugfs.h>
>  #include <asm/cacheflush.h>
>  #include <asm/tlbflush.h>
>  #include <asm/page.h>
> @@ -1498,6 +1499,62 @@ static int __init slab_proc_init(void)
>         return 0;
>  }
>  module_init(slab_proc_init);
> +
> +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM)
> +/*
> + * Display information about kmem caches that have child memcg caches.
> + */
> +static int memcg_slabinfo_show(struct seq_file *m, void *unused)
> +{
> +       struct kmem_cache *s, *c;
> +       struct slabinfo sinfo;
> +
> +       mutex_lock(&slab_mutex);

On large machines there can be thousands of memcgs and potentially
each memcg can have hundreds of kmem caches. So, the slab_mutex can be
held for a very long time.

Our internal implementation traverses the memcg tree and then
traverses 'memcg->kmem_caches' within the slab_mutex (and
cond_resched() after unlock).

> +       seq_puts(m, "# <name> <css_id[:dead]> <active_objs> <num_objs>");
> +       seq_puts(m, " <active_slabs> <num_slabs>\n");
> +       list_for_each_entry(s, &slab_root_caches, root_caches_node) {
> +               /*
> +                * Skip kmem caches that don't have any memcg children.
> +                */
> +               if (list_empty(&s->memcg_params.children))
> +                       continue;
> +
> +               memset(&sinfo, 0, sizeof(sinfo));
> +               get_slabinfo(s, &sinfo);
> +               seq_printf(m, "%-17s root      %6lu %6lu %6lu %6lu\n",
> +                          cache_name(s), sinfo.active_objs, sinfo.num_objs,
> +                          sinfo.active_slabs, sinfo.num_slabs);
> +
> +               for_each_memcg_cache(c, s) {
> +                       struct cgroup_subsys_state *css;
> +                       char *dead = "";
> +
> +                       css = &c->memcg_params.memcg->css;
> +                       if (!(css->flags & CSS_ONLINE))
> +                               dead = ":dead";

Please note that Roman's kmem cache reparenting patch series have made
kmem caches of zombie memcgs a bit tricky. On memcg offlining the
memcg kmem caches are reparented and the css->id can get recycled. So,
we want to know that the a kmem cache is reparented and which memcg it
belonged to initially. Determining if a kmem cache is reparented, we
can store a flag on the kmem cache and for the previous memcg we can
use fhandle. However to not make this more complicated, for now, we
can just have the info that the kmem cache was reparented i.e. belongs
to an offlined memcg.

> +
> +                       memset(&sinfo, 0, sizeof(sinfo));
> +                       get_slabinfo(c, &sinfo);
> +                       seq_printf(m, "%-17s %4d%5s %6lu %6lu %6lu %6lu\n",
> +                                  cache_name(c), css->id, dead,
> +                                  sinfo.active_objs, sinfo.num_objs,
> +                                  sinfo.active_slabs, sinfo.num_slabs);
> +               }
> +       }
> +       mutex_unlock(&slab_mutex);
> +       return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(memcg_slabinfo);
> +
> +static int __init memcg_slabinfo_init(void)
> +{
> +       debugfs_create_file("memcg_slabinfo", S_IFREG | S_IRUGO,
> +                           NULL, NULL, &memcg_slabinfo_fops);
> +       return 0;
> +}
> +
> +late_initcall(memcg_slabinfo_init);
> +#endif /* CONFIG_DEBUG_FS && CONFIG_MEMCG_KMEM */
>  #endif /* CONFIG_SLAB || CONFIG_SLUB_DEBUG */
>
>  static __always_inline void *__do_krealloc(const void *p, size_t new_size,
> --
> 2.18.1
>