lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161226090211.GA11455@dhcp22.suse.cz>
Date:   Mon, 26 Dec 2016 10:02:12 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     David Rientjes <rientjes@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Jonathan Corbet <corbet@....net>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mel Gorman <mgorman@...hsingularity.net>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even
 when deferred

On Fri 23-12-16 14:46:43, David Rientjes wrote:
[...]
> You want defrag=madvise to start doing background compaction for 
> everybody, which was never done before for existing users of 
> defrag=madvise?  That might be possible, I don't really care, I just think 
> it's riskier because there are existing users of defrag=madvise who are 
> opting in to new behavior because of the kernel change.  This patch 
> changes defrag=defer because it's the new option and people setting the 
> mode know what they are getting.

But my primary argument is that if you tweak "defer" value behavior
then you lose the only "stall free yet allow background compaction"
option. That option is really important. You seem to think that it
is the application which is under the control. And I am not all that
surprised because you are under control of the whole userspace in your
deployments. But there are others where the administrator is not under
the control of what application asks for yet he is responsible for the
overal "experience" if you will. Long stalls during the page faults are
often seen as bugs and users might not really care whether the
application writer really wanted THP or not...

[...]

> This is obviously fine for Kirill, and I have users who remap their .text 
> segment and do madvise(MADV_DONTNEED) because they really want hugepages 
> when they are exec'd, so I'd kindly ask you to consider the real-world use 
> cases that require background compaction to make hugepages available for 
> everybody but allow apps to opt-in to take the expense of compaction on 
> themselves rather than your own theory of what users want.

I definitely _agree_ that this is a very important usecase! I am just
trying to think long term and a more sophisticated background compaction
is something that we definitely lack and _want_ longterm. There are more
high order users than THP. I believe we really want to teach kcompactd
to maintain configurable amount of highorder pages.

If there is really a need for an immediate solution^Wworkaround then I
think that tweaking the madvise option should be reasonably safe. Admins
are really prepared for stalls because they are explicitly opting in for
madvise behavior and they will get a background compaction on top. This
is a new behavior but I do not see how it would be harmful. If an
excessive compaction is a problem then THP can be reduced to madvise
only vmas.

But, I really _do_ care about having a stall free option which is not a
complete disable of the background compaction for THP.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f3c2040edbb1..3679c47faef4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -622,8 +622,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
 	bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
 
 	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
-				&transparent_hugepage_flags) && vma_madvised)
-		return GFP_TRANSHUGE;
+				&transparent_hugepage_flags))
+		return (vma_madvise) ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
 	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
 						&transparent_hugepage_flags))
 		return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ