linux-kernel - Regression in mobility grouping?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160928014148.GA21007@cmpxchg.org>
Date:   Tue, 27 Sep 2016 21:41:48 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Vlastimil Babka <vbabka@...e.cz>, Mel Gorman <mgorman@...e.de>,
        Joonsoo Kim <js1304@...il.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        kernel-team@...com
Subject: Regression in mobility grouping?

Hi guys,

we noticed what looks like a regression in page mobility grouping
during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
uptime, but /proc/pagetypeinfo on 3.10 looks like this:

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
Node 1, zone   Normal          815          433        31518            2            0 

and on 4.0 like this:

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
Node 1, zone   Normal         3880         3530        25356            2            0            0 

4.0 is either polluting pageblocks more aggressively at allocation, or
is not able to make pageblocks movable again when the reclaimable and
unmovable allocations are released. Invoking compaction manually
(/proc/sys/vm/compact_memory) is not bringing them back, either.

The problem we are debugging is that these machines have a very high
rate of order-3 allocations (fdtable during fork, network rx), and
after the upgrade allocstalls have increased dramatically. I'm not
entirely sure this is the same issue, since even order-0 allocations
are struggling, but the mobility grouping in itself looks problematic.

I'm still going through the changes relevant to mobility grouping in
that timeframe, but if this rings a bell for anyone, it would help. I
hate blaming random patches, but these caught my eye:

9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
3a1086f mm: always steal split buddies in fallback allocations
99592d5 mm: when stealing freepages, also take pages created by splitting buddy page

The changelog states that by aggressively stealing split buddy pages
during a fallback allocation we avoid subsequent stealing. But since
there are generally more movable/reclaimable pages available, and so
less falling back and stealing freepages on behalf of movable, won't
this mean that we could expect exactly that result - growing numbers
of unmovable blocks, while rarely stealing them back in movable alloc
fallbacks? And the expansion of !MOVABLE blocks would over time make
compaction less and less effective too, seeing as it doesn't consider
anything !MOVABLE suitable migration targets?

Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both
kernels on machines with similar uptimes and directly after invoking
compaction. As you can see, the buddy lists are much more fragmented
on 4.0, with unmovable/reclaimable allocations polluting more blocks.

Any thoughts on this would be greatly appreciated. I can test patches.

Thanks!

View attachment "buddyinfo-3.10.txt" of type "text/plain" (400 bytes)

View attachment "buddyinfo-4.0.txt" of type "text/plain" (400 bytes)

View attachment "pagetypeinfo-3.10.txt" of type "text/plain" (3302 bytes)

View attachment "pagetypeinfo-4.0.txt" of type "text/plain" (3868 bytes)

View attachment "extfrag-3.10.txt" of type "text/plain" (388 bytes)

View attachment "extfrag-4.0.txt" of type "text/plain" (384 bytes)