linux-kernel - Re: [PATCH bpf-next] bpf: Add preempt_disable to protect get_perf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <310091f8-ee17-4dfc-bbb4-1bb262cbfd98@linux.dev>
Date: Fri, 26 Sep 2025 01:45:30 +0800
From: Tao Chen <chen.dylane@...ux.dev>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Song Liu <song@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
 Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
 <martin.lau@...ux.dev>, Eduard <eddyz87@...il.com>,
 Yonghong Song <yonghong.song@...ux.dev>,
 John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
 Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
 bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf-next] bpf: Add preempt_disable to protect
 get_perf_callchain

在 2025/9/23 10:53, Alexei Starovoitov 写道:
> On Mon, Sep 22, 2025 at 12:54 AM Tao Chen <chen.dylane@...ux.dev> wrote:
>>
>> As Alexei suggested, the return value from get_perf_callchain() may be
>> reused if another task preempts and requests the stack after BPF program
>> switched to migrate disable.
>>
>> Reported-by: Alexei Starovoitov <ast@...nel.org>
>> Signed-off-by: Tao Chen <chen.dylane@...ux.dev>
>> ---
>>   kernel/bpf/stackmap.c | 14 +++++---------
>>   1 file changed, 5 insertions(+), 9 deletions(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 2e182a3ac4c..07892320906 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -314,8 +314,10 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
>>          if (max_depth > sysctl_perf_event_max_stack)
>>                  max_depth = sysctl_perf_event_max_stack;
>>
>> +       preempt_disable();
>>          trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
>>                                     false, false);
>> +       preempt_enable();
> 
> This is obviously wrong.
> As soon as preemption is enabled, trace can be overwritten.
> guard(preempt)();
> can fix it, but the length of the preempt disabled section
> will be quite big.
> The way get_perf_callchain() api is written I don't see
> another option though. Unless we refactor it similar
> to bpf_try_get_buffers().
> 
> pw-bot: cr

Hi Alexei,

I tried to understand what you meant and looked at the implementation of 
get_perf_callchain.

Only one perf_callchain_entry on every cpu right now.

callchain_cpus_entries(rcu global avariable)
     ↓
struct callchain_cpus_entries {
	struct perf_callchain_entry     *cpu_entries[];
			|
}                       ｜-> perf_callchain_entry0    cpu0
			     perf_callchain_entry1     cpu1
                              …
                              perf_callchain_entryn     cpun


If we want to realise it like bpf_try_get_buffers, we should
alloc a perf_callchain_entry array on every cpu right?

callchain_cpus_entries(rcu global avariable)
     ↓
struct callchain_cpus_entries {
	struct perf_callchain_entry     *cpu_entries[];
			|
}                       ｜-> perf_callchain_entry0[N]    cpu0
			     perf_callchain_entry1[N]     cpu1
                              …
                              perf_callchain_entryn[N]     cpun

-- 
Best Regards
Tao Chen