linux-kernel - Re: [RFC][PATCH 03/26] mm, mpol: add MPOL_MF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4FF74A3B.80701@redhat.com>
Date:	Fri, 06 Jul 2012 16:27:39 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Lee Schermerhorn <Lee.Schermerhorn@...com>
CC:	Mel Gorman <mgorman@...e.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Mike Galbraith <efault@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Dan Smith <danms@...ibm.com>,
	Bharata B Rao <bharata.rao@...il.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC][PATCH 03/26] mm, mpol: add MPOL_MF_LAZY ...

On 07/06/2012 04:04 PM, Lee Schermerhorn wrote:
> On Fri, 2012-07-06 at 12:38 -0400, Rik van Riel wrote:

>> 4. Putting a lot of pages in the swap cache ends up allocating
>>      swap space. This means this NUMA migration scheme will only
>>      work on systems that have a substantial amount of memory
>>      represented by swap space. This is highly unlikely on systems
>>      with memory in the TB range. On smaller systems, it could drive
>>      the system out of memory (to the OOM killer), by "filling up"
>>      the overflow swap with migration pages instead.
>> 5. In the long run, we want the ability to migrate transparent
>>      huge pages as one unit.  The reason is simple, the performance
>>      penalty for running on the wrong NUMA node (10-20%) is on the
>>      same order of magnitude as the performance penalty for running
>>      with 4kB pages instead of 2MB pages (5-15%).
>>
>>      Breaking up large pages into small ones, and having khugepaged
>>      reconstitute them on a random NUMA node later on, will negate
>>      the performance benefits of both NUMA placement and THP.

> When I originally posted the "migrate on fault" series, I posted a
> separate series with a "migration cache" to avoid the use of swap space
> for lazy migration: http://markmail.org/message/xgvvrnn2nk4nsn2e.
>
> The migration cache was originally implemented by Marcello Tosatti for
> the old memory hotplug project:
> http://marc.info/?l=linux-mm&m=109779128211239&w=4.
>
> The idea is that you don't need swap space for lazy migration, just an
> "address_space" where you can park an anon VMA's pte's while they're
> "unmapped" to cause migration faults.  Based on a suggestion from
> Christoph Lameter, I had tried to hide the migration cache behind the
> swap cache interface to minimize changes mainly in do_swap_page and
> vmscan/reclaim.  It seemed to work, but the difference in reference
> count semantics for the mig cache -- entry removed when last pte
> migrated/mapped -- makes coordination with exit teardown, uh, tricky.

That fixes one of the two problems, but using _PTE_NUMA
or _PAGE_PROTNONE looks like it would be both easier,
and solve both.

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/