linux-kernel - Re: [PATCH v5 14/14] mm, compaction: try to capture the just-created high-order freepage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <53D8C16B.7070206@suse.cz>
Date:	Wed, 30 Jul 2014 11:56:59 +0200
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Joonsoo Kim <iamjoonsoo.kim@....com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	linux-kernel@...r.kernel.org, linux-mm@...r.kernel.org,
	Minchan Kim <minchan@...nel.org>,
	Michal Nazarewicz <mina86@...a86.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Christoph Lameter <cl@...ux.com>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Zhang Yanfei <zhangyanfei@...fujitsu.com>
Subject: Re: [PATCH v5 14/14] mm, compaction: try to capture the just-created
 high-order freepage

On 07/30/2014 10:39 AM, Joonsoo Kim wrote:
> On Tue, Jul 29, 2014 at 05:34:37PM +0200, Vlastimil Babka wrote:
>> Could do it in isolate_migratepages() for whole pageblocks only (as
>> David's patch did), but that restricts the usefulness. Or maybe do
>> it fine grained by calling isolate_migratepages_block() multiple
>> times. But the overhead of multiple calls would probably suck even
>> more for lower-order compactions. For CMA the added overhead is
>> basically only checks for next_capture_pfn that will be always
>> false, so predictable. And mostly just in branches where isolation
>> is failing, which is not the CMA's "fast path" I guess?
>
> You can do it find grained with compact_control's migratepages list
> or new private list. If some pages are isolated and added to this list,
> you can check pfn of page on this list and determine appropriate capture
> candidate page. This approach can give us more flexibility for
> choosing capture candidate without adding more complexity to
> common function. For example, you can choose capture candidate if
> there are XX isolated pages in certain range.

Hm I see. But the logic added by page capture was also a prerequisity 
for the "[RFC PATCH V4 15/15] mm, compaction: do not migrate pages when 
that cannot satisfy page fault allocation"
http://marc.info/?l=linux-mm&m=140551859423716&w=2

And that could be hardly done by a post-isolation inspection of the 
migratepages list. And I haven't given up on that idea yet :)

>>> In __isolate_free_page(), we check zone_watermark_ok() with order 0.
>>> But normal allocation logic would check zone_watermark_ok() with requested
>>> order. Your capture logic uses __isolate_free_page() and it would
>>> affect compaction success rate significantly. And it means that
>>> capture logic allocates high order page on page allocator
>>> too aggressively compared to other component such as normal high order
>>
>> It's either that, or the extra lru drain that makes the different.
>> But the "aggressiveness" would in fact mean better accuracy.
>> Watermark checking may be inaccurate. Especially when memory is
>> close to the watermark and there is only a single high-order page
>> that would satisfy the allocation.
>
> If this "aggressiveness" means better accuracy, fixing general
> function, watermark_ok() is better than adding capture logic.

That's if fixing the function wouldn't add significant overhead to all 
the callers. And making it non-racy and not prone to per-cpu counter 
drifts would certainly do that :(

> But, I guess that there is a reason that watermark_ok() is so
> conservative. If page allocator aggressively provides high order page,
> future atomic high order page request cannot succeed easily. For
> preventing this situation, watermark_ok() should be conservative.

I don't think it's intentionally conservative, just unreliable. It tests 
two things together:

1) are there enough free pages for the allocation wrt watermarks?
2) does it look like that there is a free page of the requested order?

The 1) works fine and my patch won't change that by passing a order=0. 
The problem is with 2) which is unreliable, especially when close to the 
watermarks. Note that it's not trying to keep some reserves for atomic 
requests. That's what MIGRATE_RESERVE is for. It's just unreliable to 
decide if there is the high-order page available. Even though its 
allocation would preserve the watermarks, so there is no good reason to 
prevent the allocation. So it will often pass when deciding to stop 
compaction, and then fail when allocating.

> Thanks.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/