linux-kernel - Re: [PATCH v3 6/9] mm/workingset: handle the page without memcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200318195911.GF154135@cmpxchg.org>
Date:   Wed, 18 Mar 2020 15:59:11 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     js1304@...il.com
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Michal Hocko <mhocko@...nel.org>,
        Hugh Dickins <hughd@...gle.com>,
        Minchan Kim <minchan@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mel Gorman <mgorman@...hsingularity.net>, kernel-team@....com,
        Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [PATCH v3 6/9] mm/workingset: handle the page without memcg

On Tue, Mar 17, 2020 at 02:41:54PM +0900, js1304@...il.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@....com>
> 
> When implementing workingset detection for anonymous page, I found
> some swapcache pages with NULL memcg. From the code reading, I found
> two reasons.
> 
> One is the case that swap-in readahead happens. The other is the
> corner case related to the shmem cache. These two problems should be
> fixed, but, it's not straight-forward to fix. For example, when swap-off,
> all swapped-out pages are read into swapcache. In this case, who's the
> owner of the swapcache page?
> 
> Since this problem doesn't look trivial, I decide to leave the issue and
> handles this corner case on the place where the error occurs.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>

It wouldn't be hard to find out who owns this page. The code in
mem_cgroup_try_charge() is only a few lines:

			swp_entry_t ent = { .val = page_private(page), };
			unsigned short id = lookup_swap_cgroup_id(ent);

			rcu_read_lock();
			memcg = mem_cgroup_from_id(id);
			if (memcg && !css_tryget_online(&memcg->css))
				memcg = NULL;
			rcu_read_unlock();

THAT BEING SAID, I don't think we actually *want* to know the original
cgroup for readahead pages. Because before they are accessed and
charged to the original owner, the pages are sitting on the root
cgroup LRU list and follow the root group's aging speed and LRU order.

Eviction and refault tracking is about the LRU that hosts the pages.

So IMO your patch is much less of a hack than you might think.

> diff --git a/mm/workingset.c b/mm/workingset.c
> index a9f474a..8d2e83a 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -257,6 +257,10 @@ void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg)
>  	VM_BUG_ON_PAGE(page_count(page), page);
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>  
> +	/* page_memcg() can be NULL if swap-in readahead happens */
> +	if (!page_memcg(page))
> +		return NULL;
> +
>  	advance_inactive_age(page_memcg(page), pgdat, is_file);
>  
>  	lruvec = mem_cgroup_lruvec(target_memcg, pgdat);

This means a readahead page that hasn't been accessed will actively
not be tracked as an eviction and later as a refault.

I think that's the right thing to do, but I would expand the comment:

/*
 * A page can be without a cgroup here when it was brought in by swap
 * readahead and nobody has touched it since.
 *
 * The idea behind the workingset code is to tell on page fault time
 * whether pages have been previously used or not. Since this page
 * hasn't been used, don't store a shadow entry for it; when it later
 * faults back in, we treat it as the new page that it is.
 */