[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160819135515.hft4t5q27za6eui2@redhat.com>
Date: Fri, 19 Aug 2016 15:55:15 +0200
From: Andrea Arcangeli <aarcange@...hat.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Mel Gorman <mgorman@...hsingularity.net>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux-MM <linux-mm@...ck.org>, Rik van Riel <riel@...riel.com>,
Johannes Weiner <hannes@...xchg.org>,
Minchan Kim <minchan@...nel.org>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 00/34] Move LRU page reclaim from zones to nodes v9
On Fri, Aug 19, 2016 at 03:23:20PM +0200, Vlastimil Babka wrote:
> What's that? Never head of this before, but sounds scary :) I thought
> that zone_reclaim itself was rather discouraged nowadays, not a big
> candidate for further improvement.,,
It's some fix that I tried to push upstream but wasn't merged. I kept
maintaining it because I got customers bugreport about THP causing
regressions to node_reclaim.
Hard NUMA bindings would solve that but apparently there are apps that
prefers no memory binding to allow flexible spillover, and they only
use CPU bindings only but with a strong NUMA bias provided by
node_reclaim, by shrinking the cache (and only the cache).
In any case it was a regression caused by THP because compaction
wasn't invoked. Note zone_reclaim has a synchronous more aggressive
option that blocks for write back if needed, so invoking direct
compaction there is sure ok, if it's asked on demand.
As usual it's always a tradeoff between long live and short lived
allocation so if you reserve a system for computations and you know
your allocation are very long lived it make perfect sense to be
aggressive if you tune for it.
zone_reclaim or synchronous direct compaction are obviously bad
defaults for general purpose default settings, it doesn't mean it
should be impossible to tune a system for a certain workload to run
optimal.
> Hm I'm not so sure. Are all movable allocations highmem? For example
> Joonsoo mentions in his ZONE_CMA patchset "blockdev file cache page
> [...] usually has __GFP_MOVABLE but not __GFP_HIGHMEM and __GFP_USER".
> Now we also have Minchan's infrastructure for arbitrary driver
> compaction, so those will be movable, but potentially still restricted
> to e.g. DMA32...
One option is to forbid such corner cases... and VM_WARN_ON (not a
typo :) available in my tree) if __GFP_MOVABLE is passed on lower
classzones.
The other option would be to have a per-classzone lowpfn, highpnf scan
pointers. That has some cons but hey this whole thing is a tradeoff
isn't it?
It's about the fact we're optimizing for less frequent lowmem
allocations so we can as well provide a worse compaction for lowmem
(by reducing the MOVABLE memory restricted to lower classzones like
mentioned above), but leverage the node model to have a more powerful
that crosses all zone boundaries, when the GFP_HIGHUSER is used.
I don't see why the tradeoff is valid when it comes to the LRU but not
valid when it comes to compaction and then I've to do a blind loop of
(for-each-zone-in-the-node-in-reverse { compact_zone_order(zone) })
which works worse than before and works worse than a
zone-boundary-less compaction based on the node model.
Powered by blists - more mailing lists