[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <310091f8-ee17-4dfc-bbb4-1bb262cbfd98@linux.dev>
Date: Fri, 26 Sep 2025 01:45:30 +0800
From: Tao Chen <chen.dylane@...ux.dev>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Song Liu <song@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
<martin.lau@...ux.dev>, Eduard <eddyz87@...il.com>,
Yonghong Song <yonghong.song@...ux.dev>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf-next] bpf: Add preempt_disable to protect
get_perf_callchain
在 2025/9/23 10:53, Alexei Starovoitov 写道:
> On Mon, Sep 22, 2025 at 12:54 AM Tao Chen <chen.dylane@...ux.dev> wrote:
>>
>> As Alexei suggested, the return value from get_perf_callchain() may be
>> reused if another task preempts and requests the stack after BPF program
>> switched to migrate disable.
>>
>> Reported-by: Alexei Starovoitov <ast@...nel.org>
>> Signed-off-by: Tao Chen <chen.dylane@...ux.dev>
>> ---
>> kernel/bpf/stackmap.c | 14 +++++---------
>> 1 file changed, 5 insertions(+), 9 deletions(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 2e182a3ac4c..07892320906 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -314,8 +314,10 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
>> if (max_depth > sysctl_perf_event_max_stack)
>> max_depth = sysctl_perf_event_max_stack;
>>
>> + preempt_disable();
>> trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
>> false, false);
>> + preempt_enable();
>
> This is obviously wrong.
> As soon as preemption is enabled, trace can be overwritten.
> guard(preempt)();
> can fix it, but the length of the preempt disabled section
> will be quite big.
> The way get_perf_callchain() api is written I don't see
> another option though. Unless we refactor it similar
> to bpf_try_get_buffers().
>
> pw-bot: cr
Hi Alexei,
I tried to understand what you meant and looked at the implementation of
get_perf_callchain.
Only one perf_callchain_entry on every cpu right now.
callchain_cpus_entries(rcu global avariable)
↓
struct callchain_cpus_entries {
struct perf_callchain_entry *cpu_entries[];
|
} |-> perf_callchain_entry0 cpu0
perf_callchain_entry1 cpu1
…
perf_callchain_entryn cpun
If we want to realise it like bpf_try_get_buffers, we should
alloc a perf_callchain_entry array on every cpu right?
callchain_cpus_entries(rcu global avariable)
↓
struct callchain_cpus_entries {
struct perf_callchain_entry *cpu_entries[];
|
} |-> perf_callchain_entry0[N] cpu0
perf_callchain_entry1[N] cpu1
…
perf_callchain_entryn[N] cpun
--
Best Regards
Tao Chen
Powered by blists - more mailing lists