[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251109163559.4102849-1-chen.dylane@linux.dev>
Date: Mon, 10 Nov 2025 00:35:56 +0800
From: Tao Chen <chen.dylane@...ux.dev>
To: peterz@...radead.org,
mingo@...hat.com,
acme@...nel.org,
namhyung@...nel.org,
mark.rutland@....com,
alexander.shishkin@...ux.intel.com,
jolsa@...nel.org,
irogers@...gle.com,
adrian.hunter@...el.com,
kan.liang@...ux.intel.com
Cc: linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org,
bpf@...r.kernel.org,
Tao Chen <chen.dylane@...ux.dev>
Subject: [PATCH bpf-next v5 0/3] Pass external callchain entry to get_perf_callchain
Background
==========
Alexei noted we should use preempt_disable to protect get_perf_callchain
in bpf stackmap.
https://lore.kernel.org/bpf/CAADnVQ+s8B7-fvR1TNO-bniSyKv57cH_ihRszmZV7pQDyV=VDQ@mail.gmail.com
A previous patch was submitted to attempt fixing this issue. And Andrii
suggested teach get_perf_callchain to let us pass that buffer directly to
avoid that unnecessary copy.
https://lore.kernel.org/bpf/20250926153952.1661146-1-chen.dylane@linux.dev
Proposed Solution
=================
Add external perf_callchain_entry parameter for get_perf_callchain to
allow us to use external buffer from BPF side. The biggest advantage is
that it can reduce unnecessary copies.
Todo
====
But I'm not sure if this modification is appropriate. After all, the
implementation of get_callchain_entry in the perf subsystem seems much more
complex than directly using an external buffer.
Comments and suggestions are always welcome.
Change list:
- v1 -> v2
From Jiri
- rebase code, fix conflict
- v1: https://lore.kernel.org/bpf/20251013174721.2681091-1-chen.dylane@linux.dev
- v2 -> v3:
From Andrii
- entries per CPU used in a stack-like fashion
- v2: https://lore.kernel.org/bpf/20251014100128.2721104-1-chen.dylane@linux.dev
- v3 -> v4:
From Peter
- refactor get_perf_callchain and add three new APIs to use perf
callchain easily.
From Andrii
- reuse the perf callchain management.
- rename patch1 and patch2.
- v3: https://lore.kernel.org/bpf/20251019170118.2955346-1-chen.dylane@linux.dev
- v4 -> v5:
From Yonghong
- keep add_mark false in stackmap when refactor get_perf_callchain in
patch1.
- add atomic operation in get_recursion_context in patch2.
- rename bpf_put_callchain_entry with bpf_put_perf_callchain in
patch3.
- rebase bpf-next master.
- v4: https://lore.kernel.org/bpf/20251028162502.3418817-1-chen.dylane@linux.dev
Tao Chen (3):
perf: Refactor get_perf_callchain
perf: Add atomic operation in get_recursion_context
bpf: Hold the perf callchain entry until used completely
include/linux/perf_event.h | 9 +++++
kernel/bpf/stackmap.c | 62 +++++++++++++++++++++++++-------
kernel/events/callchain.c | 73 ++++++++++++++++++++++++--------------
kernel/events/internal.h | 5 +--
4 files changed, 107 insertions(+), 42 deletions(-)
--
2.48.1
Powered by blists - more mailing lists