linux-kernel - Re: [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160208142835.GB13379@esperanza>
Date:	Mon, 8 Feb 2016 17:28:35 +0300
From:	Vladimir Davydov <vdavydov@...tuozzo.com>
To:	Johannes Weiner <hannes@...xchg.org>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...nel.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware

On Mon, Feb 08, 2016 at 01:23:53AM -0500, Johannes Weiner wrote:
> On Sun, Feb 07, 2016 at 08:27:35PM +0300, Vladimir Davydov wrote:
> > Workingset code was recently made memcg aware, but shadow node shrinker
> > is still global. As a result, one small cgroup can consume all memory
> > available for shadow nodes, possibly hurting other cgroups by reclaiming
> > their shadow nodes, even though reclaim distances stored in its shadow
> > nodes have no effect. To avoid this, we need to make shadow node
> > shrinker memcg aware.
> > 
> > Signed-off-by: Vladimir Davydov <vdavydov@...tuozzo.com>
> 
> This patch is straight forward, but there is one tiny thing that bugs
> me about it, and that is switching from available memory to the size
> of the active list. Because the active list can shrink drastically at
> runtime.

Yeah, active file lru is a volatile thing indeed. Not only can it shrink
rapidly, it can also grow in an instant (e.g. due to mark_page_accessed)
so you're right - sizing shadow node lru basing solely on the active lru
size would be too unpredictable.

> 
> It's true that both the shrinking of the active list and subsequent
> activations to regrow it will reduce the number of actionable
> refaults, and so it wouldn't be unreasonable to also shrink shadow
> nodes when the active list shrinks.
> 
> However, I think these are too many assumptions to encode in the
> shrinker, because it is only meant to prevent a worst-case explosion
> of radix tree nodes. I'd prefer it to be dumb and conservative.
> 
> Could we instead go with the current usage of the memcg? Whether
> reclaim happens globally or due to the memory limit, the usage at the
> time of reclaim gives a good idea of the memory is available to the
> group. But it's making less assumptions about the internal composition
> of the memcg's memory, and the consequences associated with that.

But that would likely result in wasting a considerable chunk of memory
for stale shadow nodes in case file caches constitute only a small part
of memcg memory consumption, which isn't good IMHO.

May be, we'd better use LRU_ALL_FILE / 2 instead?

diff --git a/mm/workingset.c b/mm/workingset.c
index 8c07cd8af15e..8a75f8d2916a 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -351,9 +351,10 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 
 	if (memcg_kmem_enabled())
 		pages = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
-						     BIT(LRU_ACTIVE_FILE));
+						     LRU_ALL_FILE);
 	else
-		pages = node_page_state(sc->nid, NR_ACTIVE_FILE);
+		pages = node_page_state(sc->nid, NR_ACTIVE_FILE) +
+			node_page_state(sc->nid, NR_INACTIVE_FILE);
 
 	/*
 	 * Active cache pages are limited to 50% of memory, and shadow
@@ -369,7 +370,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 	 *
 	 * PAGE_SIZE / radix_tree_nodes / node_entries / PAGE_SIZE
 	 */
-	max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3);
+	max_nodes = pages >> (1 + RADIX_TREE_MAP_SHIFT - 3);
 
 	if (shadow_nodes <= max_nodes)
 		return 0;