[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <173108456812.1559945.17269799494713828811.b4-ty@arm.com>
Date: Fri, 8 Nov 2024 16:49:53 +0000
From: Catalin Marinas <catalin.marinas@....com>
To: mhiramat@...nel.org,
oleg@...hat.com,
peterz@...radead.org,
will@...nel.org,
mark.rutland@....com,
Liao Chang <liaochang1@...wei.com>
Cc: linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH] arm64: uprobes: Optimize cache flushes for xol slot
On Thu, 19 Sep 2024 12:17:19 +0000, Liao Chang wrote:
> The profiling of single-thread selftests bench reveals a bottlenect in
> caches_clean_inval_pou() on ARM64. On my local testing machine, this
> function takes approximately 34% of CPU cycles for trig-uprobe-nop and
> trig-uprobe-push.
>
> This patch add a check to avoid unnecessary cache flush when writing
> instruction to the xol slot. If the instruction is same with the
> existing instruction in slot, there is no need to synchronize D/I cache.
> Since xol slot allocation and updates occur on the hot path of uprobe
> handling, The upstream kernel running on Kunpeng916 (Hi1616), 4 NUMA
> nodes, 64 cores@ 2.4GHz reveals this optimization has obvious gain for
> nop and push testcases.
>
> [...]
Applied to arm64 (for-next/misc), thanks!
[1/1] arm64: uprobes: Optimize cache flushes for xol slot
https://git.kernel.org/arm64/c/bdf94836c22a
--
Catalin
Powered by blists - more mailing lists