[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1211201833080.2278@chino.kir.corp.google.com>
Date: Tue, 20 Nov 2012 18:41:14 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Ingo Molnar <mingo@...nel.org>
cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Mel Gorman <mgorman@...e.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Paul Turner <pjt@...gle.com>,
Lee Schermerhorn <Lee.Schermerhorn@...com>,
Christoph Lameter <cl@...ux.com>,
Rik van Riel <riel@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Johannes Weiner <hannes@...xchg.org>,
Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH, v2] mm, numa: Turn 4K pte NUMA faults into effective
hugepage ones
On Tue, 20 Nov 2012, Ingo Molnar wrote:
> Reduce the 4K page fault count by looking around and processing
> nearby pages if possible.
>
> To keep the logic and cache overhead simple and straightforward
> we do a couple of simplifications:
>
> - we only scan in the HPAGE_SIZE range of the faulting address
> - we only go as far as the vma allows us
>
> Also simplify the do_numa_page() flow while at it and fix the
> previous double faulting we incurred due to not properly fixing
> up freshly migrated ptes.
>
> Suggested-by: Mel Gorman <mgorman@...e.de>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> Cc: Andrea Arcangeli <aarcange@...hat.com>
> Cc: Rik van Riel <riel@...hat.com>
> Cc: Hugh Dickins <hughd@...gle.com>
> Signed-off-by: Ingo Molnar <mingo@...nel.org>
Acked-by: David Rientjes <rientjes@...gle.com>
Ok, this is significantly better, it almost cut the regression in half on
my system. With THP enabled:
numa/core at ec05a2311c35: 136918.34 SPECjbb2005 bops
numa/core at 01aa90068b12: 128315.19 SPECjbb2005 bops (-6.3%)
numa/core at 01aa90068b12 + patch: 132523.06 SPECjbb2005 bops (-3.2%)
Here's the newest perftop, which is radically different than before (not
nearly the number of newly-added numa/core functions in the biggest
consumers) but still incurs significant overhead from page faults.
92.18% perf-6697.map [.] 0x00007fe2c5afd079
1.20% libjvm.so [.] instanceKlass::oop_push_contents(PSPromotionManag
1.05% libjvm.so [.] PSPromotionManager::drain_stacks_depth(bool)
0.78% libjvm.so [.] PSPromotionManager::copy_to_survivor_space(oopDes
0.59% libjvm.so [.] PSPromotionManager::claim_or_forward_internal_dep
0.49% [kernel] [k] page_fault
0.27% libjvm.so [.] Copy::pd_disjoint_words(HeapWord*, HeapWord*, unsigned lo
0.27% libc-2.3.6.so [.] __gettimeofday
0.19% libjvm.so [.] CardTableExtension::scavenge_contents_parallel(ObjectStar
0.16% [kernel] [k] getnstimeofday
0.14% [kernel] [k] _raw_spin_lock
0.13% [kernel] [k] generic_smp_call_function_interrupt
0.11% [kernel] [k] ktime_get
0.11% [kernel] [k] rcu_check_callbacks
0.10% [kernel] [k] read_tsc
0.09% libjvm.so [.] os::javaTimeMillis()
0.09% [kernel] [k] clear_page_c
0.08% [kernel] [k] flush_tlb_func
0.08% [kernel] [k] ktime_get_update_offsets
0.07% [kernel] [k] task_tick_fair
0.06% [kernel] [k] emulate_vsyscall
0.06% libjvm.so [.] oopDesc::size_given_klass(Klass*)
0.06% [kernel] [k] __do_page_fault
0.04% [kernel] [k] __bad_area_nosemaphore
0.04% perf [.] 0x000000000003310b
0.04% libjvm.so [.] objArrayKlass::oop_push_contents(PSPromotionManager*, oop
0.04% [kernel] [k] run_timer_softirq
0.04% [kernel] [k] copy_user_generic_string
0.03% [kernel] [k] task_numa_fault
0.03% [kernel] [k] smp_call_function_many
0.03% [kernel] [k] retint_swapgs
0.03% [kernel] [k] update_cfs_shares
0.03% [kernel] [k] error_sti
0.03% [kernel] [k] _raw_spin_lock_irq
0.03% [kernel] [k] update_curr
0.02% [kernel] [k] write_ok_or_segv
0.02% [kernel] [k] call_function_interrupt
0.02% [kernel] [k] __do_softirq
0.02% [kernel] [k] acct_update_integrals
0.02% [kernel] [k] x86_pmu_disable_all
0.02% [kernel] [k] apic_timer_interrupt
0.02% [kernel] [k] tick_sched_timer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists