linux-kernel - Re: [RFC PATCH 0/3] Change how we determine when to hand out THPs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20131213214437.6fdbf7f2.akpm@linux-foundation.org>
Date:	Fri, 13 Dec 2013 21:44:37 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Alex Thorlton <athorlton@....com>
Cc:	linux-mm@...ck.org,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Rik van Riel <riel@...hat.com>,
	Wanpeng Li <liwanp@...ux.vnet.ibm.com>,
	Mel Gorman <mgorman@...e.de>,
	Michel Lespinasse <walken@...gle.com>,
	Benjamin LaHaise <bcrl@...ck.org>,
	Oleg Nesterov <oleg@...hat.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Andy Lutomirski <luto@...capital.net>,
	Al Viro <viro@...iv.linux.org.uk>,
	David Rientjes <rientjes@...gle.com>,
	Zhang Yanfei <zhangyanfei@...fujitsu.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...e.cz>,
	Jiang Liu <jiang.liu@...wei.com>,
	Cody P Schafer <cody@...ux.vnet.ibm.com>,
	Glauber Costa <glommer@...allels.com>,
	Kamezawa Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	linux-kernel@...r.kernel.org,
	Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [RFC PATCH 0/3] Change how we determine when to hand out THPs

On Thu, 12 Dec 2013 12:00:37 -0600 Alex Thorlton <athorlton@....com> wrote:

> This patch changes the way we decide whether or not to give out THPs to
> processes when they fault in pages.

Please cc Andrea on this.

>  The way things are right now,
> touching one byte in a 2M chunk where no pages have been faulted in
> results in a process being handed a 2M hugepage, which, in some cases,
> is undesirable.  The most common issue seems to arise when a process
> uses many cores to work on small portions of an allocated chunk of
> memory.
> 
> Here are some results from a test that I wrote, which allocates memory
> in a way that doesn't benefit from the use of THPs:
> 
> # echo always > /sys/kernel/mm/transparent_hugepage/enabled
> # perf stat -a -r 5 ./thp_pthread -C 0 -m 0 -c 64 -b 128g
> 
>  Performance counter stats for './thp_pthread -C 0 -m 0 -c 64 -b 128g' (5 runs):
> 
>       93.534078104 seconds time elapsed
> ...
>
> 
> # echo never > /sys/kernel/mm/transparent_hugepage/enabled
> # perf stat -a -r 5 ./thp_pthread -C 0 -m 0 -c 64 -b 128g
> 
>  Performance counter stats for './thp_pthread -C 0 -m 0 -c 64 -b 128g' (5 runs):
>
> ...
>       76.467835263 seconds time elapsed
> ...
> 
> As you can see there's a significant performance increase when running
> this test with THP off.

yup.

> My proposed solution to the problem is to allow users to set a
> threshold at which THPs will be handed out.  The idea here is that, when
> a user faults in a page in an area where they would usually be handed a
> THP, we pull 512 pages off the free list, as we would with a regular
> THP, but we only fault in single pages from that chunk, until the user
> has faulted in enough pages to pass the threshold we've set.  Once they
> pass the threshold, we do the necessary work to turn our 512 page chunk
> into a proper THP.  As it stands now, if the user tries to fault in
> pages from different nodes, we completely give up on ever turning a
> particular chunk into a THP, and just fault in the 4K pages as they're
> requested.  We may want to make this tunable in the future (i.e. allow
> them to fault in from only 2 different nodes).

OK.  But all 512 pages reside on the same node, yes?  Whereas with thp
disabled those 512 pages would have resided closer to the CPUs which
instantiated them.  So the expected result will be somewhere in between
the 93 secs and the 76 secs?

That being said, I don't see a downside to the idea, apart from some
additional setup cost in kernel code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/