linux-kernel - Re: [RFC PATCH 0/3] Change how we determine when to hand out THPs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20131225190747.GB195633@sgi.com>
Date:	Wed, 25 Dec 2013 13:07:47 -0600
From:	Alex Thorlton <athorlton@....com>
To:	Andrea Arcangeli <aarcange@...hat.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Rik van Riel <riel@...hat.com>,
	Wanpeng Li <liwanp@...ux.vnet.ibm.com>,
	Mel Gorman <mgorman@...e.de>,
	Michel Lespinasse <walken@...gle.com>,
	Benjamin LaHaise <bcrl@...ck.org>,
	Oleg Nesterov <oleg@...hat.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Andy Lutomirski <luto@...capital.net>,
	Al Viro <viro@...iv.linux.org.uk>,
	David Rientjes <rientjes@...gle.com>,
	Zhang Yanfei <zhangyanfei@...fujitsu.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...e.cz>,
	Jiang Liu <jiang.liu@...wei.com>,
	Cody P Schafer <cody@...ux.vnet.ibm.com>,
	Glauber Costa <glommer@...allels.com>,
	Kamezawa Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/3] Change how we determine when to hand out THPs

On Tue, Dec 17, 2013 at 06:55:00PM +0100, Andrea Arcangeli wrote:
> On Tue, Dec 17, 2013 at 10:20:07AM -0600, Alex Thorlton wrote:
> > This message in particular:
> > 
> > https://lkml.org/lkml/2013/8/2/697
> 
> I think adding a prctl (or similar) inherited by child to turn off THP
> would be a fine addition to the current madvise. So you can then run
> any static app under a wrapper like "THP_disable ./whatever"
> 
> The idea is, if the software is maintained, madvise allows for
> finegrined optimization, if the software is legacy proprietary
> statically linked (or if it already uses LD_PRELOAD for other things),
> prctl takes care of that in a more coarse way (but still per-app).

That sounds fine.  I'll dig up the old patches that I wrote a while back
to enable this, and get them cleaned up and rebased to the latest kernel
version for people to review.

> > The thread I mention above originally proposed a per-process switch to
> > disable THP without the use of madvise, but it was not very well 
> > received.  I'm more than willing to revisit that idea, and possibly
> 
> I think you provided enough explanation of why it is needed (static
> binaries, proprietary apps, annoyance of LD_PRELOAD that may collide
> with other LD_PRELOAD in proprietary apps whatever), so I think a
> prctl is reasonable addition to the madvise.
> 
> We also have an madvise to turn on THP selectively on embedded that
> may boot with enabled=madvise to be sure not to waste any memory
> because of THP. But the prctl to selectively enable doesn't make too
> much sense, as one has to selectively enabled in a finegrined way to
> be sure not to cause any memory waste. So I think a NOHUGEPAGE prctl
> would be enough.
> 
> > meld the two (a per-process threshold, instead of a big-hammer on-off
> > swtich).  Let me know if that seems preferable to this idea and we can
> > discuss.
> 
> The per-process threshold would be much bigger patch, I think starting
> with the big-hammer on-off is preferable as it is much simpler and it
> should be more than enough to take care of the rare corner cases,
> while leaving the other workloads unaffected (modulo the cacheline to
> check the task or mm flags) running at max speed.

Agreed.  While I still would like to explore the threshold idea further,
I'm all for putting in a simpler fix to our current problem that will
leave default behavior unaffected.
 
> To evaluate the threshold solution, a variety of benchmarks of a
> multitude of apps would be necessary first, to see the effect it has
> on the non-corner cases. Adding the big-hammer on-off prctl instead is
> a black and white design solution that won't require black magic
> settings.
> 
> Ideally if we add a threshold later it won't require any more
> cacheline accesses, as the threshold would also need to be per-task or
> per-mm so the runtime cost of the prctl would be zero then and it
> could then become a benchmarking tweak even if we add the per-app
> threshold later.
>
> About creating heuristics to automatically detect the ideal value of
> the big-hammer per-app on/off switch (or even harder the ideal value
> of the per-app threshold), I think it's not going to happen because
> there are too few corner cases and it wouldn't be worth the cost of it
> (the cost would be significant no matter how implemented).

I see where you're coming from here.  If we do decide to move further
with implementing a threshold solution in the future, I think the best
idea is to have it default to 1, which would maintain current behavior
and leave the non-corner cases unaffected.

Thanks for your suggestions!

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/