lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fe83f15e-2d9f-e36c-3a89-ce1a2b39e3ca@suse.cz>
Date:   Thu, 5 Jan 2017 14:58:47 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Mel Gorman <mgorman@...hsingularity.net>,
        David Rientjes <rientjes@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch] mm, thp: add new background defrag option

On 01/05/2017 11:13 AM, Mel Gorman wrote:
> On Wed, Jan 04, 2017 at 03:41:59PM -0800, David Rientjes wrote:
>> There is no thp defrag option that currently allows MADV_HUGEPAGE regions 
>> to do direct compaction and reclaim while all other thp allocations simply 
>> trigger kswapd and kcompactd in the background and fail immediately.
>>
>> The "defer" setting simply triggers background reclaim and compaction for 
>> all regions, regardless of MADV_HUGEPAGE, which makes it unusable for our 
>> userspace where MADV_HUGEPAGE is being used to indicate the application is 
>> willing to wait for work for thp memory to be available.
>>
>> The "madvise" setting will do direct compaction and reclaim for these
>> MADV_HUGEPAGE regions, but does not trigger kswapd and kcompactd in the 
>> background for anybody else.
>>
>> For reasonable usage, there needs to be a mesh between the two options.  
>> This patch introduces a fifth mode, "background", that will do direct 
>> reclaim and compaction for MADV_HUGEPAGE regions and trigger background 
>> reclaim and compaction for everybody else so that hugepages may be 
>> available in the near future.
>>
>> A proposal to allow direct reclaim and compaction for MADV_HUGEPAGE 
>> regions as part of the "defer" mode, making it a very powerful setting and 
>> avoids breaking userspace, was offered: 
>> http://marc.info/?t=148236612700003.  This additional mode is a 
>> compromise.
>>
>> This patch also cleans up the helper function for storing to "enabled" 
>> and "defrag" since the former supports three modes while the latter 
>> supports five and triple_flag_store() was getting unnecessarily messy.
>>
>> Signed-off-by: David Rientjes <rientjes@...gle.com>
>> ---
>>  I don't understand Mel's suggestion of "defer-fault" as option naming.
>>
> 
> defer-fault was intended to reflect "defer faults but not anything else"
> with the only sensible alternative being madvise requests. While not a

Hmm that's probably why it's hard to understand, because "madvise
request" is just setting a vma flag, and the THP allocation (and defrag)
still happens at fault.

I'm not a fan of either name, so I've tried to implement my own
suggestion. Turns out it was easier than expected, as there's no kernel
boot option for "defer", just for "enabled", so that particular worry
was unfounded.

And personally I think that it's less confusing when one can enable defer
and madvise together (and not any other combination), than having to dig
up the difference between "defer" and "background".

I have only tested the sysfs manipulation, not actual THP, but seems to me
that alloc_hugepage_direct_gfpmask() already happens to process the flags
in a way that it works as expected.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 10eedbf14421..cc5ae86169a8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -150,7 +150,16 @@ static ssize_t triple_flag_store(struct kobject *kobj,
 				 enum transparent_hugepage_flag deferred,
 				 enum transparent_hugepage_flag req_madv)
 {
-	if (!memcmp("defer", buf,
+	if (!memcmp("defer madvise", buf,
+			min(sizeof("defer madvise")-1, count))
+	    || !memcmp("madvise defer", buf,
+			min(sizeof("madvise defer")-1, count))) {
+		if (enabled == deferred)
+			return -EINVAL;
+		clear_bit(enabled, &transparent_hugepage_flags);
+		set_bit(req_madv, &transparent_hugepage_flags);
+		set_bit(deferred, &transparent_hugepage_flags);
+	} else if (!memcmp("defer", buf,
 		    min(sizeof("defer")-1, count))) {
 		if (enabled == deferred)
 			return -EINVAL;
@@ -251,9 +260,12 @@ static ssize_t defrag_show(struct kobject *kobj,
 {
 	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
 		return sprintf(buf, "[always] defer madvise never\n");
-	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
-		return sprintf(buf, "always [defer] madvise never\n");
-	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags))
+	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags)) {
+		if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags))
+			return sprintf(buf, "always [defer] [madvise] never\n");
+		else
+			return sprintf(buf, "always [defer] madvise never\n");
+	} else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags))
 		return sprintf(buf, "always defer [madvise] never\n");
 	else
 		return sprintf(buf, "always defer madvise [never]\n");

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ