linux-kernel - Re: [PATCH] mm: Do not stall in synchronous compaction for THP allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 15 Nov 2011 13:25:13 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	David Rientjes <rientjes@...gle.com>,
	Minchan Kim <minchan.kim@...il.com>, Jan Kara <jack@...e.cz>,
	Andy Isaacson <adi@...apodia.org>,
	Johannes Weiner <jweiner@...hat.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] mm: Do not stall in synchronous compaction for THP
 allocations

On Mon, Nov 14, 2011 at 03:44:08PM -0800, Andrew Morton wrote:
> On Fri, 11 Nov 2011 10:14:14 +0000
> Mel Gorman <mgorman@...e.de> wrote:
> 
> > On Thu, Nov 10, 2011 at 03:37:32PM -0800, David Rientjes wrote:
> > > On Thu, 10 Nov 2011, Andrew Morton wrote:
> > > 
> > > > > This patch once again prevents sync migration for transparent
> > > > > hugepage allocations as it is preferable to fail a THP allocation
> > > > > than stall.
> > > > 
> > > > Who said?  ;) Presumably some people would prefer to get lots of
> > > > huge pages for their 1000-hour compute job, and waiting a bit to get
> > > > those pages is acceptable.
> > > > 
> > > 
> > > Indeed.  It seems like the behavior would better be controlled with 
> > > /sys/kernel/mm/transparent_hugepage/defrag which is set aside specifically 
> > > to control defragmentation for transparent hugepages and for that 
> > > synchronous compaction should certainly apply.
> > 
> > With khugepaged in place, it's adding a tunable that is unnecessary and
> > will not be used. Even if such a tuneable was created, the default
> > behaviour should be "do not stall".
> 
> (who said?)
> 
> Let me repeat my cruelly unanswered question: do we have sufficient
> instrumentation in place so that operators can determine that this
> change is causing them to get less huge pages than they'd like?
> 

Unless we add a mel_did_it counter to vmstat, they won't be able to
identify that it was this patch in particular.

> Because some people really really want those huge pages.  If we go and
> silently deprive them of those huge pages via changes like this, how do
> they *know* it's happening?
> 

The counters in vmstat will give them a hint but it will not tell them
*why* they are not getting the huge pages they want. That would require
further analysis using a combination of ftrace, /proc/buddyinfo,
/proc/pagetypeinfo and maybe /proc/kpageflags depending on how
important the issue is.

> And what are their options for making the kernel try harder to get
> those pages?
> 

Fine control is limited. If it is really needed, I would not oppose
a patch that allows the use of sync compaction via a new setting in
/sys/kernel/mm/transparent_hugepage/defrag. However, I think it is
a slippery slope to expose implementation details like this and I'm
not currently planning to implement such a patch.

If they have root access, they have the option of writing to
/proc/sys/vm/compact_memory to manually trigger compaction. If
that does not free enough huge pages, they could use
/proc/sys/vm/drop_caches followed by /proc/sys/vm/compact_memory and
then start the target application. If that was too heavy, they could
write a balloon application which forces some percentage of memory
to be reclaimed by allocating anonymous memory, calling mlock on it,
unmapping the memory and then writing to /proc/sys/vm/compact_memory .
It would be very heavy handed but it could be a preparation step for
running a job that absolutely must get huge pages without khugepaged
running.

> And how do we communicate all of this to those operators?

The documentation patch will help to some extent but more creative
manipulation of the system to increase the success rate of huge
page allocations and how to analyse it is not documented. This is
largely because the analysis is conducted on a case-by-case basis.
Mailing "help help" to linux-mm and hoping someone on the internet
can hear you scream may be the only option.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/