linux-kernel - Re: [PATCH 12/27] mm, vmscan: Make shrink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160616144752.GI1868@techsingularity.net>
Date:	Thu, 16 Jun 2016 15:47:52 +0100
From:	Mel Gorman <mgorman@...hsingularity.net>
To:	Vlastimil Babka <vbabka@...e.cz>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Linux-MM <linux-mm@...ck.org>, Rik van Riel <riel@...riel.com>,
	Johannes Weiner <hannes@...xchg.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 12/27] mm, vmscan: Make shrink_node decisions more
 node-centric

On Thu, Jun 16, 2016 at 03:35:15PM +0200, Vlastimil Babka wrote:
> On 06/09/2016 08:04 PM, Mel Gorman wrote:
> >Earlier patches focused on having direct reclaim and kswapd use data that
> >is node-centric for reclaiming but shrink_node() itself still uses too much
> >zone information. This patch removes unnecessary zone-based information
> >with the most important decision being whether to continue reclaim or
> >not. Some memcg APIs are adjusted as a result even though memcg itself
> >still uses some zone information.
> >
> >Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
> 
> [...]
> 
> >@@ -2372,21 +2374,27 @@ static inline bool should_continue_reclaim(struct zone *zone,
> > 	 * inactive lists are large enough, continue reclaiming
> > 	 */
> > 	pages_for_compaction = (2UL << sc->order);
> >-	inactive_lru_pages = node_page_state(zone->zone_pgdat, NR_INACTIVE_FILE);
> >+	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
> > 	if (get_nr_swap_pages() > 0)
> >-		inactive_lru_pages += node_page_state(zone->zone_pgdat, NR_INACTIVE_ANON);
> >+		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
> > 	if (sc->nr_reclaimed < pages_for_compaction &&
> > 			inactive_lru_pages > pages_for_compaction)
> > 		return true;
> >
> > 	/* If compaction would go ahead or the allocation would succeed, stop */
> >-	switch (compaction_suitable(zone, sc->order, 0, 0)) {
> >-	case COMPACT_PARTIAL:
> >-	case COMPACT_CONTINUE:
> >-		return false;
> >-	default:
> >-		return true;
> >+	for (z = 0; z <= sc->reclaim_idx; z++) {
> >+		struct zone *zone = &pgdat->node_zones[z];
> >+
> >+		switch (compaction_suitable(zone, sc->order, 0, 0)) {
> 
> Using 0 for classzone_idx here was sort of OK when each zone was reclaimed
> separately, as a Normal allocation not passing appropriate classzone_idx
> (and thus subtracting lowmem reserve from free pages) means that a false
> COMPACT_PARTIAL (or COMPACT_CONTINUE) could be returned for e.g. DMA zone.
> It means a premature end of reclaim for this single zone, which is
> relatively small anyway, so no big deal (and we might avoid useless
> over-reclaim, when even reclaiming everything wouldn't get us above the
> lowmem_reserve).
> 
> But in node-centric reclaim, such premature "return false" from a DMA zone
> stops reclaiming the whole node. So I think we should involve the real
> classzone_idx here.
> 

Fair point although for compaction, it'll occur for a marginal corner
case. Premature allowed compaction for ZONE_DMA is unfortunate but
bizarre to think there would be a high-order allocation restricted to
just that zone too.

I'll pass in sc->reclaim_idx as it represents the allocating order. That
highlights that direct reclaim was not setting reclaim_idx but that's
been corrected.

-- 
Mel Gorman
SUSE Labs