lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YElCxDzVgBBLAQhJ@cmpxchg.org>
Date:   Wed, 10 Mar 2021 17:05:56 -0500
From:   Johannes Weiner <hannes@...xchg.org>
To:     Muchun Song <songmuchun@...edance.com>
Cc:     guro@...com, mhocko@...nel.org, akpm@...ux-foundation.org,
        shakeelb@...gle.com, vdavydov.dev@...il.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        duanxiongchun@...edance.com
Subject: Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge
 kmem pages

Hello Munchun,

On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote:
> @@ -6806,11 +6823,23 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug)
>  static void uncharge_batch(const struct uncharge_gather *ug)
>  {
>  	unsigned long flags;
> +	unsigned long nr_pages;
>  
> -	if (!mem_cgroup_is_root(ug->memcg)) {
> -		page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
> +	/*
> +	 * The kmem pages can be reparented to the root memcg, in
> +	 * order to prevent the memory counter of root memcg from
> +	 * increasing indefinitely. We should decrease the memory
> +	 * counter when unchange.
> +	 */
> +	if (mem_cgroup_is_root(ug->memcg))
> +		nr_pages = ug->nr_kmem;
> +	else
> +		nr_pages = ug->nr_pages;

Correct or not, I find this unreadable. We're uncharging nr_kmem on
the root, and nr_pages against leaf groups?

It implies several things that might not be immediately obvious to the
reader of this function. Namely, that nr_kmem is a subset of nr_pages.
Or that we don't *want* to account LRU pages for the root cgroup.

The old code followed a very simple pattern: the root memcg's page
counters aren't touched.

This is no longer true: we modify them depending on very specific
circumstances. But that's too clever for the stupid uncharge_batch()
which is only supposed to flush a number of accumulators into their
corresponding page counters.

This distinction really needs to be moved down to uncharge_page() now.

> @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
>  
>  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  {
> -	unsigned long nr_pages;
> +	unsigned long nr_pages, nr_kmem;
>  	struct mem_cgroup *memcg;
>  
>  	VM_BUG_ON_PAGE(PageLRU(page), page);
> @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  	if (!page_memcg_charged(page))
>  		return;
>  
> +	nr_pages = compound_nr(page);
>  	/*
>  	 * Nobody should be changing or seriously looking at
> -	 * page memcg at this point, we have fully exclusive
> -	 * access to the page.
> +	 * page memcg or objcg at this point, we have fully
> +	 * exclusive access to the page.
>  	 */
> -	memcg = page_memcg_check(page);
> +	if (PageMemcgKmem(page)) {
> +		struct obj_cgroup *objcg;
> +
> +		objcg = page_objcg(page);
> +		memcg = obj_cgroup_memcg_get(objcg);
> +
> +		page->memcg_data = 0;
> +		obj_cgroup_put(objcg);
> +		nr_kmem = nr_pages;
> +	} else {
> +		memcg = page_memcg(page);
> +		page->memcg_data = 0;
> +		nr_kmem = 0;
> +	}

Why is all this moved above the uncharge_batch() call?

It separates the pointer manipulations from the refcounting, which
makes the code very difficult to follow.

> +
>  	if (ug->memcg != memcg) {
>  		if (ug->memcg) {
>  			uncharge_batch(ug);
>  			uncharge_gather_clear(ug);
>  		}
>  		ug->memcg = memcg;
> +		ug->dummy_page = page;

Why this change?

>  		/* pairs with css_put in uncharge_batch */
>  		css_get(&ug->memcg->css);
>  	}
>  
> -	nr_pages = compound_nr(page);
>  	ug->nr_pages += nr_pages;
> +	ug->nr_kmem += nr_kmem;
> +	ug->pgpgout += !nr_kmem;

Oof.

Yes, this pgpgout line is an equivalent transformation for counting
LRU compound pages. But unless you already know that, it's completely
impossible to understand what the intent here is.

Please avoid clever tricks like this. If you need to check whether the
page is kmem, test PageMemcgKmem() instead of abusing the counters as
boolean flags. This is supposed to be read by human beings, too.

> -	if (PageMemcgKmem(page))
> -		ug->nr_kmem += nr_pages;
> -	else
> -		ug->pgpgout++;
> -
> -	ug->dummy_page = page;
> -	page->memcg_data = 0;
> -	css_put(&ug->memcg->css);
> +	css_put(&memcg->css);

Sorry, these two functions are no longer readable after your changes.

Please retain the following sequence as discrete steps:

1. look up memcg from the page
2. flush existing batch if memcg changed
3. add page's various counts to the current batch
4. clear page->memcg and decrease the referece count to whatever it was pointing to

And as per above, step 3. is where we should check whether to uncharge
the memcg's page counter at all:

	if (PageMemcgKmem(page)) {
		ug->nr_pages += nr_pages;
		ug->nr_kmem += nr_pages;
	} else {
		/* LRU pages aren't accounted at the root level */
		if (!mem_cgroup_is_root(memcg))
			ug->nr_pages += nr_pages;
		ug->pgpgout++;
	}

In fact, it might be a good idea to rename ug->nr_pages to
ug->nr_memory to highlight how it maps to the page_counter.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ