[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALvZod7pdOx0a1v4oX5-7ZfCykM8iwRwPkW-+gbO1B4+j1SXqw@mail.gmail.com>
Date: Wed, 19 Jun 2019 16:48:09 -0700
From: Shakeel Butt <shakeelb@...gle.com>
To: Waiman Long <longman@...hat.com>
Cc: Christoph Lameter <cl@...ux.com>,
Pekka Enberg <penberg@...nel.org>,
David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Michal Hocko <mhocko@...nel.org>, Roman Gushchin <guro@...com>,
Johannes Weiner <hannes@...xchg.org>,
Vladimir Davydov <vdavydov.dev@...il.com>
Subject: Re: [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file
Hi Waiman,
On Wed, Jun 19, 2019 at 10:16 AM Waiman Long <longman@...hat.com> wrote:
>
> There are concerns about memory leaks from extensive use of memory
> cgroups as each memory cgroup creates its own set of kmem caches. There
> is a possiblity that the memcg kmem caches may remain even after the
> memory cgroups have been offlined. Therefore, it will be useful to show
> the status of each of memcg kmem caches.
>
> This patch introduces a new <debugfs>/memcg_slabinfo file which is
> somewhat similar to /proc/slabinfo in format, but lists only information
> about kmem caches that have child memcg kmem caches. Information
> available in /proc/slabinfo are not repeated in memcg_slabinfo.
>
> A portion of a sample output of the file was:
>
> # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs>
> rpc_inode_cache root 13 51 1 1
> rpc_inode_cache 48 0 0 0 0
> fat_inode_cache root 1 45 1 1
> fat_inode_cache 41 2 45 1 1
> xfs_inode root 770 816 24 24
> xfs_inode 92 22 34 1 1
> xfs_inode 88:dead 1 34 1 1
> xfs_inode 89:dead 23 34 1 1
> xfs_inode 85 4 34 1 1
> xfs_inode 84 9 34 1 1
>
> The css id of the memcg is also listed. If a memcg is not online,
> the tag ":dead" will be attached as shown above.
>
> Suggested-by: Shakeel Butt <shakeelb@...gle.com>
> Signed-off-by: Waiman Long <longman@...hat.com>
> ---
> mm/slab_common.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 57 insertions(+)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 58251ba63e4a..2bca1558a722 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -17,6 +17,7 @@
> #include <linux/uaccess.h>
> #include <linux/seq_file.h>
> #include <linux/proc_fs.h>
> +#include <linux/debugfs.h>
> #include <asm/cacheflush.h>
> #include <asm/tlbflush.h>
> #include <asm/page.h>
> @@ -1498,6 +1499,62 @@ static int __init slab_proc_init(void)
> return 0;
> }
> module_init(slab_proc_init);
> +
> +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM)
> +/*
> + * Display information about kmem caches that have child memcg caches.
> + */
> +static int memcg_slabinfo_show(struct seq_file *m, void *unused)
> +{
> + struct kmem_cache *s, *c;
> + struct slabinfo sinfo;
> +
> + mutex_lock(&slab_mutex);
On large machines there can be thousands of memcgs and potentially
each memcg can have hundreds of kmem caches. So, the slab_mutex can be
held for a very long time.
Our internal implementation traverses the memcg tree and then
traverses 'memcg->kmem_caches' within the slab_mutex (and
cond_resched() after unlock).
> + seq_puts(m, "# <name> <css_id[:dead]> <active_objs> <num_objs>");
> + seq_puts(m, " <active_slabs> <num_slabs>\n");
> + list_for_each_entry(s, &slab_root_caches, root_caches_node) {
> + /*
> + * Skip kmem caches that don't have any memcg children.
> + */
> + if (list_empty(&s->memcg_params.children))
> + continue;
> +
> + memset(&sinfo, 0, sizeof(sinfo));
> + get_slabinfo(s, &sinfo);
> + seq_printf(m, "%-17s root %6lu %6lu %6lu %6lu\n",
> + cache_name(s), sinfo.active_objs, sinfo.num_objs,
> + sinfo.active_slabs, sinfo.num_slabs);
> +
> + for_each_memcg_cache(c, s) {
> + struct cgroup_subsys_state *css;
> + char *dead = "";
> +
> + css = &c->memcg_params.memcg->css;
> + if (!(css->flags & CSS_ONLINE))
> + dead = ":dead";
Please note that Roman's kmem cache reparenting patch series have made
kmem caches of zombie memcgs a bit tricky. On memcg offlining the
memcg kmem caches are reparented and the css->id can get recycled. So,
we want to know that the a kmem cache is reparented and which memcg it
belonged to initially. Determining if a kmem cache is reparented, we
can store a flag on the kmem cache and for the previous memcg we can
use fhandle. However to not make this more complicated, for now, we
can just have the info that the kmem cache was reparented i.e. belongs
to an offlined memcg.
> +
> + memset(&sinfo, 0, sizeof(sinfo));
> + get_slabinfo(c, &sinfo);
> + seq_printf(m, "%-17s %4d%5s %6lu %6lu %6lu %6lu\n",
> + cache_name(c), css->id, dead,
> + sinfo.active_objs, sinfo.num_objs,
> + sinfo.active_slabs, sinfo.num_slabs);
> + }
> + }
> + mutex_unlock(&slab_mutex);
> + return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(memcg_slabinfo);
> +
> +static int __init memcg_slabinfo_init(void)
> +{
> + debugfs_create_file("memcg_slabinfo", S_IFREG | S_IRUGO,
> + NULL, NULL, &memcg_slabinfo_fops);
> + return 0;
> +}
> +
> +late_initcall(memcg_slabinfo_init);
> +#endif /* CONFIG_DEBUG_FS && CONFIG_MEMCG_KMEM */
> #endif /* CONFIG_SLAB || CONFIG_SLUB_DEBUG */
>
> static __always_inline void *__do_krealloc(const void *p, size_t new_size,
> --
> 2.18.1
>
Powered by blists - more mailing lists