[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FF74A3B.80701@redhat.com>
Date: Fri, 06 Jul 2012 16:27:39 -0400
From: Rik van Riel <riel@...hat.com>
To: Lee Schermerhorn <Lee.Schermerhorn@...com>
CC: Mel Gorman <mgorman@...e.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
Suresh Siddha <suresh.b.siddha@...el.com>,
Mike Galbraith <efault@....de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Dan Smith <danms@...ibm.com>,
Bharata B Rao <bharata.rao@...il.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC][PATCH 03/26] mm, mpol: add MPOL_MF_LAZY ...
On 07/06/2012 04:04 PM, Lee Schermerhorn wrote:
> On Fri, 2012-07-06 at 12:38 -0400, Rik van Riel wrote:
>> 4. Putting a lot of pages in the swap cache ends up allocating
>> swap space. This means this NUMA migration scheme will only
>> work on systems that have a substantial amount of memory
>> represented by swap space. This is highly unlikely on systems
>> with memory in the TB range. On smaller systems, it could drive
>> the system out of memory (to the OOM killer), by "filling up"
>> the overflow swap with migration pages instead.
>> 5. In the long run, we want the ability to migrate transparent
>> huge pages as one unit. The reason is simple, the performance
>> penalty for running on the wrong NUMA node (10-20%) is on the
>> same order of magnitude as the performance penalty for running
>> with 4kB pages instead of 2MB pages (5-15%).
>>
>> Breaking up large pages into small ones, and having khugepaged
>> reconstitute them on a random NUMA node later on, will negate
>> the performance benefits of both NUMA placement and THP.
> When I originally posted the "migrate on fault" series, I posted a
> separate series with a "migration cache" to avoid the use of swap space
> for lazy migration: http://markmail.org/message/xgvvrnn2nk4nsn2e.
>
> The migration cache was originally implemented by Marcello Tosatti for
> the old memory hotplug project:
> http://marc.info/?l=linux-mm&m=109779128211239&w=4.
>
> The idea is that you don't need swap space for lazy migration, just an
> "address_space" where you can park an anon VMA's pte's while they're
> "unmapped" to cause migration faults. Based on a suggestion from
> Christoph Lameter, I had tried to hide the migration cache behind the
> swap cache interface to minimize changes mainly in do_swap_page and
> vmscan/reclaim. It seemed to work, but the difference in reference
> count semantics for the mig cache -- entry removed when last pte
> migrated/mapped -- makes coordination with exit teardown, uh, tricky.
That fixes one of the two problems, but using _PTE_NUMA
or _PAGE_PROTNONE looks like it would be both easier,
and solve both.
--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists