linux-kernel - Re: [PATCH bpf-next v9 1/3] bpf: refactor max_depth computation in bpf_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzbBR7GBnWCA4E-RuEkbYJ7bUEfmJ5nd0g8G_bV0MG5tAA@mail.gmail.com>
Date: Mon, 22 Sep 2025 15:38:51 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: "Lecomte, Arnaud" <contact@...aud-lcm.com>
Cc: alexei.starovoitov@...il.com, yonghong.song@...ux.dev, song@...nel.org, 
	andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org, daniel@...earbox.net, 
	eddyz87@...il.com, haoluo@...gle.com, john.fastabend@...il.com, 
	jolsa@...nel.org, kpsingh@...nel.org, linux-kernel@...r.kernel.org, 
	martin.lau@...ux.dev, sdf@...ichev.me, 
	syzbot+c9b724fbb41cf2538b7b@...kaller.appspotmail.com, 
	syzkaller-bugs@...glegroups.com
Subject: Re: [PATCH bpf-next v9 1/3] bpf: refactor max_depth computation in bpf_get_stack()

On Sat, Sep 20, 2025 at 12:32 PM Lecomte, Arnaud <contact@...aud-lcm.com> wrote:
>
>
> On 19/09/2025 23:50, Andrii Nakryiko wrote:
>
> On Fri, Sep 12, 2025 at 4:34 PM Arnaud Lecomte <contact@...aud-lcm.com> wrote:
>
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Acked-by: Yonghong Song <yonghong.song@...ux.dev>
> Acked-by: Song Liu <song@...nel.org>
> Signed-off-by: Arnaud Lecomte <contact@...aud-lcm.com>
> ---
> Changes in v2:
>  - Removed the checking 'map_size % map_elem_size' from
>    stack_map_calculate_max_depth
>  - Changed stack_map_calculate_max_depth params name to be more generic
>
> Changes in v3:
>  - Changed map size param to size in max depth helper
>
> Changes in v4:
>  - Fixed indentation in max depth helper for args
>
> Changes in v5:
>  - Bound back trace_nr to num_elem in __bpf_get_stack
>  - Make a copy of sysctl_perf_event_max_stack
>    in stack_map_calculate_max_depth
>
> Changes in v6:
>  - Restrained max_depth computation only when required
>  - Additional cleanup from Song in __bpf_get_stack
>
> Changes in v7:
>  - Removed additional cleanup from v6
>
> Changes in v9:
>  - Fixed incorrect removal of num_elem in get stack
>
> Link to v8: https://lore.kernel.org/all/20250905134625.26531-1-contact@arnaud-lcm.com/
> ---
> ---
>  kernel/bpf/stackmap.c | 39 +++++++++++++++++++++++++++------------
>  1 file changed, 27 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..a794e04f5ae9 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,28 @@ static inline int stack_map_data_size(struct bpf_map *map)
>                 sizeof(struct bpf_stack_build_id) : sizeof(u64);
>  }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @size:  Size of the buffer/map value in bytes
> + * @elem_size:  Size of each stack trace element
> + * @flags:  BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
> + *
> + * Return: Maximum number of stack trace entries that can be safely stored
> + */
> +static u32 stack_map_calculate_max_depth(u32 size, u32 elem_size, u64 flags)
> +{
> +       u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> +       u32 max_depth;
> +       u32 curr_sysctl_max_stack = READ_ONCE(sysctl_perf_event_max_stack);
> +
> +       max_depth = size / elem_size;
> +       max_depth += skip;
> +       if (max_depth > curr_sysctl_max_stack)
> +               return curr_sysctl_max_stack;
> +
> +       return max_depth;
> +}
> +
>  static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
>  {
>         u64 elem_size = sizeof(struct stack_map_bucket) +
> @@ -300,20 +322,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
>  BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
>            u64, flags)
>  {
> -       u32 max_depth = map->value_size / stack_map_data_size(map);
> -       u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> +       u32 elem_size = stack_map_data_size(map);
>         bool user = flags & BPF_F_USER_STACK;
>         struct perf_callchain_entry *trace;
>         bool kernel = !user;
> +       u32 max_depth;
>
>         if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
>                                BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
>                 return -EINVAL;
>
> -       max_depth += skip;
> -       if (max_depth > sysctl_perf_event_max_stack)
> -               max_depth = sysctl_perf_event_max_stack;
> -
> +       max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
>         trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
>                                    false, false);
>
> @@ -406,7 +425,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
>                             struct perf_callchain_entry *trace_in,
>                             void *buf, u32 size, u64 flags, bool may_fault)
>  {
> -       u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
> +       u32 trace_nr, copy_len, elem_size, max_depth;
>         bool user_build_id = flags & BPF_F_USER_BUILD_ID;
>         bool crosstask = task && task != current;
>         u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> @@ -438,10 +457,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
>                 goto clear;
>         }
>
> -       num_elem = size / elem_size;
> -       max_depth = num_elem + skip;
> -       if (sysctl_perf_event_max_stack < max_depth)
> -               max_depth = sysctl_perf_event_max_stack;
> +       max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
>
>         if (may_fault)
>                 rcu_read_lock(); /* need RCU for perf's callchain below */
> @@ -461,7 +477,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
>         }
>
>         trace_nr = trace->nr - skip;
> -       trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
>
> Is this also part of refactoring? If yes, it deserves a mention on why
> it's ok to just drop this.
>
> pw-bot: cr
>
> Yes it is also part of the refactoring as stack_map_calculate_max_depth now already curtains the trace->nr to the max possible number of elements, there is no need to do the clamping twice. This is valid assuming that get_perf_callchain and get_callchain_entry_for_task correctly set this limit.
>

What about that third case:

if (trace_in)
    trace = trace_in;

Did you analyze if that gets its trace->nr set properly as well (as of
this patch, without taking into account changes in the follow up
patches). Because it looks like this removal belongs in patch #2, no?

In either case, all the other changes in this patch except the removal
of this line is refactoring and as far as I can tell don't change the
logic. This line removal does (potentially) change the logic, so it
would be good to do it separately, explaining why you think it's the
correct thing to do.



>         copy_len = trace_nr * elem_size;
>
>         ips = trace->ip + skip;
> --
> 2.43.0
>
> Thanks,
> Arnaud