linux-kernel - Re: [patch] mm, compaction: drain pcps for zone when kcompactd fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1803061549590.258123@chino.kir.corp.google.com>
Date:   Tue, 6 Mar 2018 15:57:11 -0800 (PST)
From:   David Rientjes <rientjes@...gle.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
cc:     Vlastimil Babka <vbabka@...e.cz>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch] mm, compaction: drain pcps for zone when kcompactd
 fails

On Thu, 1 Mar 2018, David Rientjes wrote:

> On Thu, 1 Mar 2018, Andrew Morton wrote:
> 
> > On Thu, 1 Mar 2018 03:42:04 -0800 (PST) David Rientjes <rientjes@...gle.com> wrote:
> > 
> > > It's possible for buddy pages to become stranded on pcps that, if drained,
> > > could be merged with other buddy pages on the zone's free area to form
> > > large order pages, including up to MAX_ORDER.
> > 
> > I grabbed this as-is.  Perhaps you could send along a new changelog so
> > that others won't be asking the same questions as Vlastimil?
> > 
> > The patch has no reviews or acks at this time...
> > 
> 
> Thanks.
> 
> As mentioned in my response to Vlastimil, I think the case could also be 
> made that we should do drain_all_pages(zone) in try_to_compact_pages() 
> when we defer for direct compactors.  It would be great to have feedback 
> from those on the cc on that point, the patch in general, and then I can 
> send an update.
> 

Andrew, here's a new changelog that should clarify the questions asked 
about the patch.


It's possible for free pages to become stranded on per-cpu pagesets (pcps) 
that, if drained, could be merged with buddy pages on the zone's free area 
to form large order pages, including up to MAX_ORDER.

Consider a verbose example using the tools/vm/page-types tool at the
beginning of a ZONE_NORMAL ('B' indicates a buddy page and 'S' indicates a
slab page).  Pages on pcps do not have any page flags set.

109954  1       _______S________________________________________________________
109955  2       __________B_____________________________________________________
109957  1       ________________________________________________________________
109958  1       __________B_____________________________________________________
109959  7       ________________________________________________________________
109960  1       __________B_____________________________________________________
109961  9       ________________________________________________________________
10996a  1       __________B_____________________________________________________
10996b  3       ________________________________________________________________
10996e  1       __________B_____________________________________________________
10996f  1       ________________________________________________________________
...
109f8c  1       __________B_____________________________________________________
109f8d  2       ________________________________________________________________
109f8f  2       __________B_____________________________________________________
109f91  f       ________________________________________________________________
109fa0  1       __________B_____________________________________________________
109fa1  7       ________________________________________________________________
109fa8  1       __________B_____________________________________________________
109fa9  1       ________________________________________________________________
109faa  1       __________B_____________________________________________________
109fab  1       _______S________________________________________________________

The compaction migration scanner is attempting to defragment this memory 
since it is at the beginning of the zone.  It has done so quite well, all 
movable pages have been migrated.  From pfn [0x109955, 0x109fab), there
are only buddy pages and pages without flags set.

These pages may be stranded on pcps that could otherwise allow this memory 
to be coalesced if freed back to the zone free area.  It is possible that 
some of these pages may not be on pcps and that something has called 
alloc_pages() and used the memory directly, but we rely on the absence of
__GFP_MOVABLE in these cases to allocate from MIGATE_UNMOVABLE pageblocks 
to try to keep these MIGRATE_MOVABLE pageblocks as free as possible.

These buddy and pcp pages, spanning 1,621 pages, could be coalesced and 
allow for three transparent hugepages to be dynamically allocated.  
Running the numbers for all such spans on the system, it was found that 
there were over 400 such spans of only buddy pages and pages without flags 
set at the time this /proc/kpageflags sample was collected.  Without this 
support, there were _no_ order-9 or order-10 pages free.

When kcompactd fails to defragment memory such that a cc.order page can
be allocated, drain all pcps for the zone back to the buddy allocator so
this stranding cannot occur.  Compaction for that order will subsequently
be deferred, which acts as a ratelimit on this drain.

Signed-off-by: David Rientjes <rientjes@...gle.com>