[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6685fe19-753d-7d76-aced-3bb071d7c81d@suse.cz>
Date: Wed, 29 Jun 2016 08:53:22 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: David Rientjes <rientjes@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@....com>,
Mel Gorman <mgorman@...hsingularity.net>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [patch] mm, compaction: make sure freeing scanner isn't
persistently expensive
On 06/29/2016 03:39 AM, David Rientjes wrote:
> It's possible that the freeing scanner can be consistently expensive if
> memory is well compacted toward the end of the zone with few free pages
> available in that area.
>
> If all zone memory is synchronously compacted, say with
> /proc/sys/vm/compact_memory, and thp is faulted, it is possible to
> iterate a massive amount of memory even with the per-zone cached free
> position.
>
> For example, after compacting all memory and faulting thp for heap, it
> was observed that compact_free_scanned increased as much as 892518911 4KB
> pages while compact_stall only increased by 171. The freeing scanner
> iterated ~20GB of memory for each compaction stall.
>
> To address this, if too much memory is spanned on the freeing scanner's
> freelist when releasing back to the system, return the low pfn rather than
> the high pfn. It's declared that the freeing scanner will become too
> expensive if the high pfn is used, so use the low pfn instead.
>
> The amount of memory declared as too expensive to iterate is subjectively
> chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB
> pages.
>
> Signed-off-by: David Rientjes <rientjes@...gle.com>
Hmm, I don't know. Seems it only works around one corner case of a
larger issue. The cost for the scanning was already paid, the patch
prevents it from being paid again, but only until the scanners are reset.
Note also that THP's no longer do direct compaction by default in recent
kernels.
To fully solve the freepage scanning issue, we should probably pick and
finish one of the proposed reworks from Joonsoo or myself, or the
approach that replaces free scanner with direct freelist allocations.
> ---
> mm/compaction.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -47,10 +47,16 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
> #define pageblock_start_pfn(pfn) block_start_pfn(pfn, pageblock_order)
> #define pageblock_end_pfn(pfn) block_end_pfn(pfn, pageblock_order)
>
> +/*
> + * Releases isolated free pages back to the buddy allocator. Returns the pfn
> + * that should be cached for the next compaction of this zone, depending on how
> + * much memory the free pages span.
> + */
> static unsigned long release_freepages(struct list_head *freelist)
> {
> struct page *page, *next;
> unsigned long high_pfn = 0;
> + unsigned long low_pfn = -1UL;
>
> list_for_each_entry_safe(page, next, freelist, lru) {
> unsigned long pfn = page_to_pfn(page);
> @@ -58,8 +64,18 @@ static unsigned long release_freepages(struct list_head *freelist)
> __free_page(page);
> if (pfn > high_pfn)
> high_pfn = pfn;
> + if (pfn < low_pfn)
> + low_pfn = pfn;
> }
>
> + /*
> + * If the list of freepages spans too much memory, the cached position
> + * should be updated to the lowest pfn to prevent the freeing scanner
> + * from becoming too expensive.
> + */
> + if ((high_pfn - low_pfn) > (COMPACT_CLUSTER_MAX << PAGE_SHIFT))
> + return low_pfn;
> +
> return high_pfn;
> }
>
>
Powered by blists - more mailing lists