linux-kernel - Re: [PATCH] mm: set khugepaged_max_ptes_none by 1/8 of HPAGE_PMD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.10.1502271300120.2122@chino.kir.corp.google.com>
Date:	Fri, 27 Feb 2015 13:12:54 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Rik van Riel <riel@...hat.com>
cc:	Ebru Akagunduz <ebru.akagunduz@...il.com>, linux-mm@...ck.org,
	akpm@...ux-foundation.org, kirill@...temov.name, mhocko@...e.cz,
	mgorman@...e.de, sasha.levin@...cle.com, hughd@...gle.com,
	hannes@...xchg.org, vbabka@...e.cz, linux-kernel@...r.kernel.org,
	aarcange@...hat.com
Subject: Re: [PATCH] mm: set khugepaged_max_ptes_none by 1/8 of
 HPAGE_PMD_NR

On Fri, 27 Feb 2015, Rik van Riel wrote:

> >> Using THP, programs can access memory faster, by having the
> >> kernel collapse small pages into large pages. The parameter
> >> max_ptes_none specifies how many extra small pages (that are
> >> not already mapped) can be allocated when collapsing a group
> >> of small pages into one large page.
> >>
> > 
> > Not exactly, khugepaged isn't "allocating" small pages to collapse into a 
> > hugepage, rather it is allocating a hugepage and then remapping the 
> > pageblock's mapped pages.
> 
> How would you describe the amount of extra memory
> allocated, as a result of converting a partially
> mapped 2MB area into a THP?
> 
> It is not physically allocating 4kB pages, but
> I would like to keep the text understandable to
> people who do not know the THP internals.
> 

I would say it specifies how much unmapped memory can become mapped by a 
hugepage.

> I think we do need to change the default.
> 
> Why? See this bug:
> 
> >> The problem was reported here:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=93111
> 
> Now, there may be a better value than HPAGE_PMD_NR/8, but
> I am not sure what it would be, or why.
> 
> I do know that HPAGE_PMD_NR-1 results in undesired behaviour,
> as seen in the bug above...
> 

I know that the value of 64 would also be undesirable for Google since we 
tightly constrain memory usage, we have used max_ptes_none == 0 since it 
was introduced.   We can get away with that because our malloc() is 
modified to try to give back large contiguous ranges of memory 
periodically back to the system, also using madvise(MADV_DONTNEED), and 
tries to avoid splitting thp memory.

The value is determined by how the system will be used: do you tightly 
constrain memory usage and not allow any unmapped memory be collapsed into 
a hugepage, or do you have an abundance of memory and really want an 
aggressive value like HPAGE_PMD_NR-1.  Depending on the properties of the 
system, you can tune this to anything you want just like we do in 
initscripts.

I'm only concerned here about changing a default that has been around for 
four years and the possibly negative implications that will have on users 
who never touch this value.  They undoubtedly get less memory backed by 
thp, and that can lead to a performance regression.  So if this patch is 
merged and we get a bug report for the 4.1 kernel, do we tell that user 
that we changed behavior out from under them and to adjust the tunable 
back to HPAGE_PMD_NR-1?

Meanwhile, the bug report you cite has a workaround that has always been 
available for thp kernels:
# echo 64 > /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/