[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240709005142.4044530-1-liaochang1@huawei.com>
Date: Tue, 9 Jul 2024 00:51:40 +0000
From: Liao Chang <liaochang1@...wei.com>
To: <peterz@...radead.org>, <mingo@...hat.com>, <acme@...nel.org>,
<namhyung@...nel.org>, <mark.rutland@....com>,
<alexander.shishkin@...ux.intel.com>, <jolsa@...nel.org>,
<irogers@...gle.com>, <adrian.hunter@...el.com>, <kan.liang@...ux.intel.com>,
<ast@...nel.org>, <daniel@...earbox.net>, <andrii@...nel.org>,
<martin.lau@...ux.dev>, <eddyz87@...il.com>, <song@...nel.org>,
<yonghong.song@...ux.dev>, <john.fastabend@...il.com>, <kpsingh@...nel.org>,
<sdf@...ichev.me>, <haoluo@...gle.com>, <mykolal@...com>, <shuah@...nel.org>,
<liaochang1@...wei.com>
CC: <linux-kernel@...r.kernel.org>, <linux-perf-users@...r.kernel.org>,
<bpf@...r.kernel.org>, <linux-kselftest@...r.kernel.org>
Subject: [PATCH 0/2] Optimize the return_instance management of uretprobe
While exploring uretprobe syscall and trampoline for ARM64, we observed
a slight performance gain for Redis benchmark using uretprobe syscall.
This patchset aims to further improve the performance of uretprobe by
optimizing the management of struct return_instance data.
In details, uretprobe utilizes dynamically allocated memory for struct
return_instance data. These data track the call chain of instrumented
functions. This approach is not efficient, especially considering the
inherent locality of function invocation.
This patchset proposes a rework of the return_instances management. It
replaces dynamic memory allocation with a statically allocated array.
This approach leverages the stack-style usage of return_instance and
remove the need for kamlloc/kfree operations.
This patch has been tested on Kunpeng916 (Hi1616), 4 NUMA nodes, 64
cores @ 2.4GHz. Redis benchmarks show a throughput gain by 2% for Redis
GET and SET commands:
------------------------------------------------------------------
Test case | No uretprobes | uretprobes | uretprobes
| | (current) | (optimized)
==================================================================
Redis SET (RPS) | 47025 | 40619 (-13.6%) | 41529 (-11.6%)
------------------------------------------------------------------
Redis GET (RPS) | 46715 | 41426 (-11.3%) | 42306 (-9.4%)
------------------------------------------------------------------
Liao Chang (2):
uprobes: Optimize the return_instance related routines
selftests/bpf: Add uretprobe test for return_instance management
include/linux/uprobes.h | 10 +-
kernel/events/uprobes.c | 162 +++++++++++-------
.../bpf/prog_tests/uretprobe_depth.c | 150 ++++++++++++++++
.../selftests/bpf/progs/uretprobe_depth.c | 19 ++
4 files changed, 274 insertions(+), 67 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/uretprobe_depth.c
create mode 100644 tools/testing/selftests/bpf/progs/uretprobe_depth.c
--
2.34.1
Powered by blists - more mailing lists