linux-kernel - Re: [PATCH v5 3/4] mm/page_owner: Print memcg information

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YgJeWth50eP9L0PK@dhcp22.suse.cz>
Date:   Tue, 8 Feb 2022 13:13:14 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Waiman Long <longman@...hat.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Petr Mladek <pmladek@...e.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
        linux-mm@...ck.org, Ira Weiny <ira.weiny@...el.com>,
        Mike Rapoport <rppt@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Roman Gushchin <guro@...com>,
        Rafael Aquini <aquini@...hat.com>,
        Mike Rapoport <rppt@...ux.ibm.com>
Subject: Re: [PATCH v5 3/4] mm/page_owner: Print memcg information

On Mon 07-02-22 19:05:31, Waiman Long wrote:
> It was found that a number of dying memcgs were not freed because
> they were pinned by some charged pages that were present. Even "echo 1 >
> /proc/sys/vm/drop_caches" wasn't able to free those pages. These dying
> but not freed memcgs tend to increase in number over time with the side
> effect that percpu memory consumption as shown in /proc/meminfo also
> increases over time.

I still believe that this is very suboptimal way to debug offline memcgs
but memcg information can be useful in other contexts and it doesn't
cost us anything except for an additional output so I am fine with this.
 
> In order to find out more information about those pages that pin
> dying memcgs, the page_owner feature is extended to print memory
> cgroup information especially whether the cgroup is dying or not.
> RCU read lock is taken when memcg is being accessed to make sure
> that it won't be freed.
> 
> Signed-off-by: Waiman Long <longman@...hat.com>
> Acked-by: David Rientjes <rientjes@...gle.com>
> Acked-by: Roman Gushchin <guro@...com>
> Acked-by: Mike Rapoport <rppt@...ux.ibm.com>

With few comments/questions below.

> ---
>  mm/page_owner.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 28dac73e0542..d4c311455753 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -10,6 +10,7 @@
>  #include <linux/migrate.h>
>  #include <linux/stackdepot.h>
>  #include <linux/seq_file.h>
> +#include <linux/memcontrol.h>
>  #include <linux/sched/clock.h>
>  
>  #include "internal.h"
> @@ -325,6 +326,47 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
>  	seq_putc(m, '\n');
>  }
>  
> +/*
> + * Looking for memcg information and print it out
> + */

I am not sure this is particularly useful comment.

> +static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
> +					 struct page *page)
> +{
> +#ifdef CONFIG_MEMCG
> +	unsigned long memcg_data;
> +	struct mem_cgroup *memcg;
> +	bool dying;
> +
> +	rcu_read_lock();
> +	memcg_data = READ_ONCE(page->memcg_data);
> +	if (!memcg_data)
> +		goto out_unlock;
> +
> +	if (memcg_data & MEMCG_DATA_OBJCGS)
> +		ret += scnprintf(kbuf + ret, count - ret,
> +				"Slab cache page\n");
> +
> +	memcg = page_memcg_check(page);
> +	if (!memcg)
> +		goto out_unlock;
> +
> +	dying = (memcg->css.flags & CSS_DYING);

Is there any specific reason why you haven't used mem_cgroup_online?

> +	ret += scnprintf(kbuf + ret, count - ret,
> +			"Charged %sto %smemcg ",
> +			PageMemcgKmem(page) ? "(via objcg) " : "",
> +			dying ? "dying " : "");
> +
> +	/* Write cgroup name directly into kbuf */
> +	cgroup_name(memcg->css.cgroup, kbuf + ret, count - ret);
> +	ret += strlen(kbuf + ret);

cgroup_name should return the length of the path added to the buffer.

> +	ret += scnprintf(kbuf + ret, count - ret, "\n");

I do not see any overflow prevention here. I believe you really need to
check ret >= count after each scnprintf/cgroup_name.
-- 
Michal Hocko
SUSE Labs