[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1452246933-10890-1-git-send-email-chris@chris-wilson.co.uk>
Date: Fri, 8 Jan 2016 09:55:33 +0000
From: Chris Wilson <chris@...is-wilson.co.uk>
To: x86@...nel.org
Cc: Chris Wilson <chris@...is-wilson.co.uk>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, Toshi Kani <toshi.kani@....com>,
Borislav Petkov <bp@...e.de>,
"Luis R. Rodriguez" <mcgrof@...e.com>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
Sai Praneeth <sai.praneeth.prakhya@...el.com>,
linux-kernel@...r.kernel.org
Subject: [PATCH] x86: Micro-optimise clflush_cache_range()
Whilst inspecting the asm for clflush_cache_range() and some perf profiles
that required extensive flushing of single cachelines (from part of the
intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading
boot_cpu_data.x86_clflush_size on every iteration of the loop. We can
manually hoist that read which perf regarded as taking ~25% of the
function time for a single cacheline flush.
Signed-off-by: Chris Wilson <chris@...is-wilson.co.uk>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: "H. Peter Anvin" <hpa@...or.com>
Cc: x86@...nel.org
Cc: Toshi Kani <toshi.kani@....com>
Cc: Borislav Petkov <bp@...e.de>
Cc: "Luis R. Rodriguez" <mcgrof@...e.com>
Cc: Stephen Rothwell <sfr@...b.auug.org.au>
Cc: Ross Zwisler <ross.zwisler@...ux.intel.com>
Cc: Sai Praneeth <sai.praneeth.prakhya@...el.com>
Cc: linux-kernel@...r.kernel.org
Acked-by: "H. Peter Anvin" <hpa@...or.com>
---
arch/x86/mm/pageattr.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index a3137a4feed1..6000ad7f560c 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -129,14 +129,16 @@ within(unsigned long addr, unsigned long start, unsigned long end)
*/
void clflush_cache_range(void *vaddr, unsigned int size)
{
- unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+ const unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
+ void *p = (void *)((unsigned long)vaddr & ~(clflush_size - 1));
void *vend = vaddr + size;
- void *p;
+
+ if (p >= vend)
+ return;
mb();
- for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
- p < vend; p += boot_cpu_data.x86_clflush_size)
+ for (; p < vend; p += clflush_size)
clflushopt(p);
mb();
--
2.7.0.rc3
Powered by blists - more mailing lists