[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0e042569-eb95-623f-242c-9cf9c87c5223@suse.cz>
Date: Tue, 27 Nov 2018 10:23:21 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Mel Gorman <mgorman@...hsingularity.net>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: David Rientjes <rientjes@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Zi Yan <zi.yan@...rutgers.edu>,
Michal Hocko <mhocko@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH 4/5] mm: Reclaim small amounts of memory when an external
fragmentation event occurs
On 11/23/18 12:45 PM, Mel Gorman wrote:
> An external fragmentation event was previously described as
>
> When the page allocator fragments memory, it records the event using
> the mm_page_alloc_extfrag event. If the fallback_order is smaller
> than a pageblock order (order-9 on 64-bit x86) then it's considered
> an event that will cause external fragmentation issues in the future.
>
> The kernel reduces the probability of such events by increasing the
> watermark sizes by calling set_recommended_min_free_kbytes early in the
> lifetime of the system. This works reasonably well in general but if there
> are enough sparsely populated pageblocks then the problem can still occur
> as enough memory is free overall and kswapd stays asleep.
>
> This patch introduces a watermark_boost_factor sysctl that allows a zone
> watermark to be temporarily boosted when an external fragmentation causing
> events occurs. The boosting will stall allocations that would decrease
> free memory below the boosted low watermark and kswapd is woken if the
> calling context allows to reclaim an amount of memory relative to the
> size of the high watermark and the watermark_boost_factor until the boost
> is cleared. When kswapd finishes, it wakes kcompactd at the pageblock
> order to clean some of the pageblocks that may have been affected by
> the fragmentation event. kswapd avoids any writeback, slab shrinkage and
> swap from reclaim context during this operation to avoid excessive system
> disruption in the name of fragmentation avoidance. Care is taken so that
> kswapd will do normal reclaim work if the system is really low on memory.
>
> This was evaluated using the same workloads as "mm, page_alloc: Spread
> allocations across zones before introducing fragmentation".
>
> 1-socket Skylake machine
> config-global-dhp__workload_thpfioscale XFS (no special madvise)
> 4 fio threads, 1 THP allocating thread
> --------------------------------------
>
> 4.20-rc3 extfrag events < order 9: 804694
> 4.20-rc3+patch: 408912 (49% reduction)
> 4.20-rc3+patch1-4: 18421 (98% reduction)
>
> 4.20.0-rc3 4.20.0-rc3
> lowzone-v5r8 boost-v5r8
> Amean fault-base-1 653.58 ( 0.00%) 652.71 ( 0.13%)
> Amean fault-huge-1 0.00 ( 0.00%) 178.93 * -99.00%*
>
> 4.20.0-rc3 4.20.0-rc3
> lowzone-v5r8 boost-v5r8
> Percentage huge-1 0.00 ( 0.00%) 5.12 ( 100.00%)
>
> Note that external fragmentation causing events are massively reduced
> by this path whether in comparison to the previous kernel or the vanilla
> kernel. The fault latency for huge pages appears to be increased but that
> is only because THP allocations were successful with the patch applied.
>
> 1-socket Skylake machine
> global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
> -----------------------------------------------------------------
>
> 4.20-rc3 extfrag events < order 9: 291392
> 4.20-rc3+patch: 191187 (34% reduction)
> 4.20-rc3+patch1-4: 13464 (95% reduction)
>
> thpfioscale Fault Latencies
> 4.20.0-rc3 4.20.0-rc3
> lowzone-v5r8 boost-v5r8
> Min fault-base-1 912.00 ( 0.00%) 905.00 ( 0.77%)
> Min fault-huge-1 127.00 ( 0.00%) 135.00 ( -6.30%)
> Amean fault-base-1 1467.55 ( 0.00%) 1481.67 ( -0.96%)
> Amean fault-huge-1 1127.11 ( 0.00%) 1063.88 * 5.61%*
>
> 4.20.0-rc3 4.20.0-rc3
> lowzone-v5r8 boost-v5r8
> Percentage huge-1 77.64 ( 0.00%) 83.46 ( 7.49%)
>
> As before, massive reduction in external fragmentation events, some jitter
> on latencies and an increase in THP allocation success rates.
>
> 2-socket Haswell machine
> config-global-dhp__workload_thpfioscale XFS (no special madvise)
> 4 fio threads, 5 THP allocating threads
> ----------------------------------------------------------------
>
> 4.20-rc3 extfrag events < order 9: 215698
> 4.20-rc3+patch: 200210 (7% reduction)
> 4.20-rc3+patch1-4: 14263 (93% reduction)
>
> 4.20.0-rc3 4.20.0-rc3
> lowzone-v5r8 boost-v5r8
> Amean fault-base-5 1346.45 ( 0.00%) 1306.87 ( 2.94%)
> Amean fault-huge-5 3418.60 ( 0.00%) 1348.94 ( 60.54%)
>
> 4.20.0-rc3 4.20.0-rc3
> lowzone-v5r8 boost-v5r8
> Percentage huge-5 0.78 ( 0.00%) 7.91 ( 910.64%)
>
> There is a 93% reduction in fragmentation causing events, there
> is a big reduction in the huge page fault latency and allocation
> success rate is higher.
>
> 2-socket Haswell machine
> global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
> -----------------------------------------------------------------
>
> 4.20-rc3 extfrag events < order 9: 166352
> 4.20-rc3+patch: 147463 (11% reduction)
> 4.20-rc3+patch1-4: 11095 (93% reduction)
>
> thpfioscale Fault Latencies
> 4.20.0-rc3 4.20.0-rc3
> lowzone-v5r8 boost-v5r8
> Amean fault-base-5 6217.43 ( 0.00%) 7419.67 * -19.34%*
> Amean fault-huge-5 3163.33 ( 0.00%) 3263.80 ( -3.18%)
>
> 4.20.0-rc3 4.20.0-rc3
> lowzone-v5r8 boost-v5r8
> Percentage huge-5 95.14 ( 0.00%) 87.98 ( -7.53%)
>
> There is a large reduction in fragmentation events with some jitter around
> the latencies and success rates. As before, the high THP allocation
> success rate does mean the system is under a lot of pressure. However,
> as the fragmentation events are reduced, it would be expected that the
> long-term allocation success rate would be higher.
>
> Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
Acked-by: Vlastimil Babka <vbabka@...e.cz>
Powered by blists - more mailing lists