[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 10 Oct 2017 10:56:01 +0800
From: Aaron Lu <aaron.lu@...el.com>
To: linux-mm <linux-mm@...ck.org>, lkml <linux-kernel@...r.kernel.org>
Cc: Dave Hansen <dave.hansen@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andi Kleen <ak@...ux.intel.com>,
Huang Ying <ying.huang@...el.com>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Kemi Wang <kemi.wang@...el.com>,
Anshuman Khandual <khandual@...ux.vnet.ibm.com>
Subject: [PATCH v2] mm/page_alloc.c: inline __rmqueue()
__rmqueue() is called by rmqueue_bulk() and rmqueue() under zone->lock
and the two __rmqueue() call sites are in very hot page allocator paths.
Since __rmqueue() is a small function, inline it can save us some time.
With the will-it-scale/page_fault1/process benchmark, when using nr_cpu
processes to stress buddy, this patch improved the benchmark by 6.3% on
a 2-sockets Intel-Skylake system and 4.6% on a 4-sockets Intel-Skylake
system. The benefit being less on 4 sockets machine is due to the lock
contention there(perf-profile/native_queued_spin_lock_slowpath=81%) is
less severe than on the 2 sockets machine(84%).
What the benchmark does is: it forks nr_cpu processes and then each
process does the following:
1 mmap() 128M anonymous space;
2 writes to each page there to trigger actual page allocation;
3 munmap() it.
in a loop.
https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault1.c
This patch adds inline to __rmqueue() and vmlinux' size doesn't have any
change after this patch according to size(1).
without this patch:
text data bss dec hex filename
9968576 5793372 17715200 33477148 1fed21c vmlinux
with this patch:
text data bss dec hex filename
9968576 5793372 17715200 33477148 1fed21c vmlinux
Reviewed-by: Anshuman Khandual <khandual@...ux.vnet.ibm.com>
Tested-by: Anshuman Khandual <khandual@...ux.vnet.ibm.com>
Signed-off-by: Aaron Lu <aaron.lu@...el.com>
---
v2: change commit message according to Dave Hansen's suggestion.
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e309ce4a44a..c9605c7ebaf6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2291,7 +2291,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
* Do the hard work of removing an element from the buddy allocator.
* Call me with the zone->lock already held.
*/
-static struct page *__rmqueue(struct zone *zone, unsigned int order,
+static inline struct page *__rmqueue(struct zone *zone, unsigned int order,
int migratetype)
{
struct page *page;
--
2.13.6
Powered by blists - more mailing lists