[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240919121719.2148361-1-liaochang1@huawei.com>
Date: Thu, 19 Sep 2024 12:17:19 +0000
From: Liao Chang <liaochang1@...wei.com>
To: <mhiramat@...nel.org>, <oleg@...hat.com>, <peterz@...radead.org>,
<catalin.marinas@....com>, <will@...nel.org>, <mark.rutland@....com>
CC: <linux-kernel@...r.kernel.org>, <linux-trace-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>
Subject: [PATCH] arm64: uprobes: Optimize cache flushes for xol slot
The profiling of single-thread selftests bench reveals a bottlenect in
caches_clean_inval_pou() on ARM64. On my local testing machine, this
function takes approximately 34% of CPU cycles for trig-uprobe-nop and
trig-uprobe-push.
This patch add a check to avoid unnecessary cache flush when writing
instruction to the xol slot. If the instruction is same with the
existing instruction in slot, there is no need to synchronize D/I cache.
Since xol slot allocation and updates occur on the hot path of uprobe
handling, The upstream kernel running on Kunpeng916 (Hi1616), 4 NUMA
nodes, 64 cores@ 2.4GHz reveals this optimization has obvious gain for
nop and push testcases.
Before (next-20240918)
----------------------
uprobe-nop ( 1 cpus): 0.418 ± 0.001M/s ( 0.418M/s/cpu)
uprobe-push ( 1 cpus): 0.411 ± 0.005M/s ( 0.411M/s/cpu)
uprobe-ret ( 1 cpus): 2.052 ± 0.002M/s ( 2.052M/s/cpu)
uretprobe-nop ( 1 cpus): 0.350 ± 0.000M/s ( 0.350M/s/cpu)
uretprobe-push ( 1 cpus): 0.353 ± 0.000M/s ( 0.353M/s/cpu)
uretprobe-ret ( 1 cpus): 1.074 ± 0.001M/s ( 1.074M/s/cpu)
After
-----
uprobe-nop ( 1 cpus): 0.926 ± 0.000M/s ( 0.926M/s/cpu)
uprobe-push ( 1 cpus): 0.910 ± 0.001M/s ( 0.910M/s/cpu)
uprobe-ret ( 1 cpus): 2.056 ± 0.001M/s ( 2.056M/s/cpu)
uretprobe-nop ( 1 cpus): 0.653 ± 0.001M/s ( 0.653M/s/cpu)
uretprobe-push ( 1 cpus): 0.645 ± 0.000M/s ( 0.645M/s/cpu)
uretprobe-ret ( 1 cpus): 1.093 ± 0.001M/s ( 1.093M/s/cpu)
Signed-off-by: Liao Chang <liaochang1@...wei.com>
---
arch/arm64/kernel/probes/uprobes.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/arm64/kernel/probes/uprobes.c b/arch/arm64/kernel/probes/uprobes.c
index d49aef2657cd..5ee27509d6f6 100644
--- a/arch/arm64/kernel/probes/uprobes.c
+++ b/arch/arm64/kernel/probes/uprobes.c
@@ -17,12 +17,16 @@ void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
void *xol_page_kaddr = kmap_atomic(page);
void *dst = xol_page_kaddr + (vaddr & ~PAGE_MASK);
+ if (!memcmp(dst, src, len))
+ goto done;
+
/* Initialize the slot */
memcpy(dst, src, len);
/* flush caches (dcache/icache) */
sync_icache_aliases((unsigned long)dst, (unsigned long)dst + len);
+done:
kunmap_atomic(xol_page_kaddr);
}
--
2.34.1
Powered by blists - more mailing lists