[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171009075338.GC1798@intel.com>
Date: Mon, 9 Oct 2017 15:53:38 +0800
From: Aaron Lu <aaron.lu@...el.com>
To: Anshuman Khandual <khandual@...ux.vnet.ibm.com>
Cc: linux-mm <linux-mm@...ck.org>, lkml <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andi Kleen <ak@...ux.intel.com>,
Dave Hansen <dave.hansen@...el.com>,
Huang Ying <ying.huang@...el.com>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Kemi Wang <kemi.wang@...el.com>
Subject: Re: [PATCH] page_alloc.c: inline __rmqueue()
On Mon, Oct 09, 2017 at 01:07:36PM +0530, Anshuman Khandual wrote:
> On 10/09/2017 11:14 AM, Aaron Lu wrote:
> > __rmqueue() is called by rmqueue_bulk() and rmqueue() under zone->lock
> > and that lock can be heavily contended with memory intensive applications.
> >
> > Since __rmqueue() is a small function, inline it can save us some time.
> > With the will-it-scale/page_fault1/process benchmark, when using nr_cpu
> > processes to stress buddy:
> >
> > On a 2 sockets Intel-Skylake machine:
> > base %change head
> > 77342 +6.3% 82203 will-it-scale.per_process_ops
> >
> > On a 4 sockets Intel-Skylake machine:
> > base %change head
> > 75746 +4.6% 79248 will-it-scale.per_process_ops
> >
> > This patch adds inline to __rmqueue().
> >
> > Signed-off-by: Aaron Lu <aaron.lu@...el.com>
>
> Ran it through kernel bench and ebizzy micro benchmarks. Results
> were comparable with and without the patch. May be these are not
> the appropriate tests for this inlining improvement. Anyways it
I think so.
The benefit only appears when the lock contention is huge enough, e.g.
perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath is as high
as 80% with the workload I have used.
> does not have any performance degradation either.
>
> Reviewed-by: Anshuman Khandual <khandual@...ux.vnet.ibm.com>
> Tested-by: Anshuman Khandual <khandual@...ux.vnet.ibm.com>
Thanks!
Powered by blists - more mailing lists