linux-kernel - Re: [PATCH v5 14/14] mm, compaction: try to capture the just-created high-order freepage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAAmzW4N28-6W1-K5xHHGOzYEFtFkXOzeP35iC-VdQCP-EOtANA@mail.gmail.com>
Date:	Wed, 30 Jul 2014 23:19:10 +0900
From:	Joonsoo Kim <js1304@...il.com>
To:	Vlastimil Babka <vbabka@...e.cz>
Cc:	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	LKML <linux-kernel@...r.kernel.org>, linux-mm@...r.kernel.org,
	Minchan Kim <minchan@...nel.org>,
	Michal Nazarewicz <mina86@...a86.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Christoph Lameter <cl@...ux.com>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Zhang Yanfei <zhangyanfei@...fujitsu.com>
Subject: Re: [PATCH v5 14/14] mm, compaction: try to capture the just-created
 high-order freepage

Oops... resend because of omitting everyone on CC.

2014-07-30 18:56 GMT+09:00 Vlastimil Babka <vbabka@...e.cz>:
> On 07/30/2014 10:39 AM, Joonsoo Kim wrote:
>>
>> On Tue, Jul 29, 2014 at 05:34:37PM +0200, Vlastimil Babka wrote:
>>>
>>> Could do it in isolate_migratepages() for whole pageblocks only (as
>>> David's patch did), but that restricts the usefulness. Or maybe do
>>> it fine grained by calling isolate_migratepages_block() multiple
>>> times. But the overhead of multiple calls would probably suck even
>>> more for lower-order compactions. For CMA the added overhead is
>>> basically only checks for next_capture_pfn that will be always
>>> false, so predictable. And mostly just in branches where isolation
>>> is failing, which is not the CMA's "fast path" I guess?
>>
>>
>> You can do it find grained with compact_control's migratepages list
>> or new private list. If some pages are isolated and added to this list,
>> you can check pfn of page on this list and determine appropriate capture
>> candidate page. This approach can give us more flexibility for
>> choosing capture candidate without adding more complexity to
>> common function. For example, you can choose capture candidate if
>> there are XX isolated pages in certain range.
>
>
> Hm I see. But the logic added by page capture was also a prerequisity for
> the "[RFC PATCH V4 15/15] mm, compaction: do not migrate pages when that
> cannot satisfy page fault allocation"
> http://marc.info/?l=linux-mm&m=140551859423716&w=2
>
> And that could be hardly done by a post-isolation inspection of the
> migratepages list. And I haven't given up on that idea yet :)

Okay. I didn't look at that patch yet. I will try later :)

>
>>>> In __isolate_free_page(), we check zone_watermark_ok() with order 0.
>>>> But normal allocation logic would check zone_watermark_ok() with
>>>> requested
>>>> order. Your capture logic uses __isolate_free_page() and it would
>>>> affect compaction success rate significantly. And it means that
>>>> capture logic allocates high order page on page allocator
>>>> too aggressively compared to other component such as normal high order
>>>
>>>
>>> It's either that, or the extra lru drain that makes the different.
>>> But the "aggressiveness" would in fact mean better accuracy.
>>> Watermark checking may be inaccurate. Especially when memory is
>>> close to the watermark and there is only a single high-order page
>>> that would satisfy the allocation.
>>
>>
>> If this "aggressiveness" means better accuracy, fixing general
>> function, watermark_ok() is better than adding capture logic.
>
>
> That's if fixing the function wouldn't add significant overhead to all the
> callers. And making it non-racy and not prone to per-cpu counter drifts
> would certainly do that :(
>
>
>> But, I guess that there is a reason that watermark_ok() is so
>> conservative. If page allocator aggressively provides high order page,
>> future atomic high order page request cannot succeed easily. For
>> preventing this situation, watermark_ok() should be conservative.
>
>
> I don't think it's intentionally conservative, just unreliable. It tests two
> things together:
>
> 1) are there enough free pages for the allocation wrt watermarks?
> 2) does it look like that there is a free page of the requested order?

I don't think that watermark_ok()'s intention is checking if there is a free
page of the requested order. If we want to know it, we could use more
easy way something like below.

X = number of total freepage - number of freepage lower than requested order
If X is positive, we can conclude that there is at least one freepage
of requested order and this equation is easy to compute.

But, watermark_ok() doesn't do that. Instead, it uses mark value to determine
if we can go further. I guess that this means that allocation/reclaim logic want
to preserve certain level of high order freepages according to system memory
size, although I don't know what the reason is exactly. So
the "aggressiveness" on capture logic here could break what
allocation/reclaim want.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/