linux-kernel - Re: [patch] mm, compaction: drain pcps for zone when kcompactd fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <672ebefc-483d-2932-37b5-4ffe58156f0f@suse.cz>
Date:   Thu, 1 Mar 2018 13:23:34 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     David Rientjes <rientjes@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch] mm, compaction: drain pcps for zone when kcompactd fails

On 03/01/2018 12:42 PM, David Rientjes wrote:
> It's possible for buddy pages to become stranded on pcps that, if drained,
> could be merged with other buddy pages on the zone's free area to form
> large order pages, including up to MAX_ORDER.
> 
> Consider a verbose example using the tools/vm/page-types tool at the
> beginning of a ZONE_NORMAL, where 'B' indicates a buddy page and 'S'
> indicates a slab page, which the migration scanner is attempting to
> defragment (and doing it well, absent coalescing up to cc.order):

How can the migration scanner defragment a slab page?

> 109954  1       _______S________________________________________________________
> 109955  2       __________B_____________________________________________________
> 109957  1       ________________________________________________________________
> 109958  1       __________B_____________________________________________________
> 109959  7       ________________________________________________________________
> 109960  1       __________B_____________________________________________________
> 109961  9       ________________________________________________________________
> 10996a  1       __________B_____________________________________________________
> 10996b  3       ________________________________________________________________
> 10996e  1       __________B_____________________________________________________
> 10996f  1       ________________________________________________________________
> 109970  1       __________B_____________________________________________________
> 109971  f       ________________________________________________________________
> ...
> 109f88  1       __________B_____________________________________________________
> 109f89  3       ________________________________________________________________
> 109f8c  1       __________B_____________________________________________________
> 109f8d  2       ________________________________________________________________
> 109f8f  2       __________B_____________________________________________________
> 109f91  f       ________________________________________________________________
> 109fa0  1       __________B_____________________________________________________
> 109fa1  7       ________________________________________________________________
> 109fa8  1       __________B_____________________________________________________
> 109fa9  1       ________________________________________________________________
> 109faa  1       __________B_____________________________________________________
> 109fab  1       _______S________________________________________________________
> 
> These buddy pages, spanning 1,621 pages, could be coalesced and allow for
> three transparent hugepages to be dynamically allocated.  Totaling all
> hugepage length spans that could be coalesced, this could yield over 400
> hugepages on the zone's free area when at the time this /proc/kpageflags

I don't understand the numbers here. With order-9 hugepages it's 512
pages per hugepage. If the buddy pages span 1621 pages, how can they
yield 400 hugepages?

> was collected, there was _no_ order-9 or order-10 pages available for
> allocation even after triggering compaction through procfs.
> 
> When kcompactd fails to defragment memory such that a cc.order page can
> be allocated, drain all pcps for the zone back to the buddy allocator so
> this stranding cannot occur.  Compaction for that order will subsequently
> be deferred, which acts as a ratelimit on this drain.

I don't mind the change given the ratelimit, but what difference was
observed in practice?

BTW I wonder if we could be smarter and quicker about the drains. Let a
pcp struct page be easily recognized as such, and store the cpu number
in there. Migration scanner could then maintain a cpumask, and recognize
if the only missing pages for coalescing a cc->order block are on the
pcplists, and then do a targeted drain.
But that only makes sense to implement if it can make a noticeable
difference to offset the additional overhead, of course.

> Signed-off-by: David Rientjes <rientjes@...gle.com>
> ---
>  mm/compaction.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1987,6 +1987,14 @@ static void kcompactd_do_work(pg_data_t *pgdat)
>  		if (status == COMPACT_SUCCESS) {
>  			compaction_defer_reset(zone, cc.order, false);
>  		} else if (status == COMPACT_PARTIAL_SKIPPED || status == COMPACT_COMPLETE) {
> +			/*
> +			 * Buddy pages may become stranded on pcps that could
> +			 * otherwise coalesce on the zone's free area for
> +			 * order >= cc.order.  This is ratelimited by the
> +			 * upcoming deferral.
> +			 */
> +			drain_all_pages(zone);
> +
>  			/*
>  			 * We use sync migration mode here, so we defer like
>  			 * sync direct compaction does.
>