linux-kernel - Re: [RFC PATCH 00/10] redesign compaction algorithm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAmzW4OuArqzavsPY3_3u5OnnO=ZY1HSnUT4Rgoq2ytd+n89xQ@mail.gmail.com>
Date:	Fri, 26 Jun 2015 11:07:47 +0900
From:	Joonsoo Kim <js1304@...il.com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Linux Memory Management List <linux-mm@...ck.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Rik van Riel <riel@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Minchan Kim <minchan@...nel.org>
Subject: Re: [RFC PATCH 00/10] redesign compaction algorithm

2015-06-26 3:41 GMT+09:00 Mel Gorman <mgorman@...e.de>:
> On Fri, Jun 26, 2015 at 03:14:39AM +0900, Joonsoo Kim wrote:
>> > It could though. Reclaim/compaction is entered for orders higher than
>> > PAGE_ALLOC_COSTLY_ORDER and when scan priority is sufficiently high.
>> > That could be adjusted if you have a viable case where orders <
>> > PAGE_ALLOC_COSTLY_ORDER must succeed and currently requires excessive
>> > reclaim instead of relying on compaction.
>>
>> Yes. I saw this problem in real situation. In ARM, order-2 allocation
>> is requested
>> in fork(), so it should be succeed. But, there is not enough order-2 freepage,
>> so reclaim/compaction begins. Compaction fails repeatedly although
>> I didn't check exact reason.
>
> That should be identified and repaired prior to reimplementing
> compaction because it's important.

Unfortunately, I got report a long time ago and I don't have any real
environment
to reproduce it. What I have remembered is that there are too many unmovable
allocations from graphic driver and zram and they really makes fragmented
memory. In that time, problem is solved by ad-hoc approach such as killing
many apps. But, it's sub-optimal and loosing performance greatly so I imitate
this effect in my benchmark and try to solve it by this patchset.

>> >> >> 3) Compaction capability is highly depends on migratetype of memory,
>> >> >> because freepage scanner doesn't scan unmovable pageblock.
>> >> >>
>> >> >
>> >> > For a very good reason. Unmovable allocation requests that fallback to
>> >> > other pageblocks are the worst in terms of fragmentation avoidance. The
>> >> > more of these events there are, the more the system will decay. If there
>> >> > are many of these events then a compaction benchmark may start with high
>> >> > success rates but decay over time.
>> >> >
>> >> > Very broadly speaking, the more the mm_page_alloc_extfrag tracepoint
>> >> > triggers with alloc_migratetype == MIGRATE_UNMOVABLE, the faster the
>> >> > system is decaying. Having the freepage scanner select unmovable
>> >> > pageblocks will trigger this event more frequently.
>> >> >
>> >> > The unfortunate impact is that selecting unmovable blocks from the free
>> >> > csanner will improve compaction success rates for high-order kernel
>> >> > allocations early in the lifetime of the system but later fail high-order
>> >> > allocation requests as more pageblocks get converted to unmovable. It
>> >> > might be ok for kernel allocations but THP will eventually have a 100%
>> >> > failure rate.
>> >>
>> >> I wrote rationale in the patch itself. We already use non-movable pageblock
>> >> for migration scanner. It empties non-movable pageblock so number of
>> >> freepage on non-movable pageblock will increase. Using non-movable
>> >> pageblock for freepage scanner negates this effect so number of freepage
>> >> on non-movable pageblock will be balanced. Could you tell me in detail
>> >> how freepage scanner select unmovable pageblocks will cause
>> >> more fragmentation? Possibly, I don't understand effect of this patch
>> >> correctly and need some investigation. :)
>> >>
>> >
>> > The long-term success rate of fragmentation avoidance depends on
>> > minimsing the number of UNMOVABLE allocation requests that use a
>> > pageblock belonging to another migratetype. Once such a fallback occurs,
>> > that pageblock potentially can never be used for a THP allocation again.
>> >
>> > Lets say there is an unmovable pageblock with 500 free pages in it. If
>> > the freepage scanner uses that pageblock and allocates all 500 free
>> > pages then the next unmovable allocation request needs a new pageblock.
>> > If one is not completely free then it will fallback to using a
>> > RECLAIMABLE or MOVABLE pageblock forever contaminating it.
>>
>> Yes, I can imagine that situation. But, as I said above, we already use
>> non-movable pageblock for migration scanner. While unmovable
>> pageblock with 500 free pages fills, some other unmovable pageblock
>> with some movable pages will be emptied. Number of freepage
>> on non-movable would be maintained so fallback doesn't happen.
>>
>> Anyway, it is better to investigate this effect. I will do it and attach
>> result on next submission.
>>
>
> Lets say we have X unmovable pageblocks and Y pageblocks overall. If the
> migration scanner takes movable pages from X then there is more space for
> unmovable allocations without having to increase X -- this is good. If
> the free scanner uses the X pageblocks as targets then they can fill. The
> next unmovable allocation then falls back to another pageblock and we
> either have X+1 unmovable pageblocks (full steal) or a mixed pageblock
> (partial steal) that cannot be used for THP. Do this enough times and
> X == Y and all THP allocations fail.

This was similar with my understanding but different conclusion.

As number of unmovable pageblocks, X, which is filled by movable pages
due to this compaction change increases, reclaimed/migrated out pages
from them also increase. And, then, further unmovable allocation request
will use this free space and eventually these pageblocks are totally filled
by unmovable allocation. Therefore, I guess, in the long-term, increasing X
is saturated and X == Y will not happen.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/