lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2d9320eb-6be9-43e4-a63e-08e7ab1427e3@linux.dev>
Date: Thu, 18 Sep 2025 20:45:23 +0800
From: Tao Chen <chen.dylane@...ux.dev>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: ast@...nel.org, daniel@...earbox.net, john.fastabend@...il.com,
 andrii@...nel.org, martin.lau@...ux.dev, eddyz87@...il.com, song@...nel.org,
 yonghong.song@...ux.dev, kpsingh@...nel.org, sdf@...ichev.me,
 haoluo@...gle.com, jolsa@...nel.org, bpf@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Add lookup_and_delete_elem for
 BPF_MAP_STACK_TRACE

在 2025/9/18 06:16, Andrii Nakryiko 写道:
> On Tue, Sep 9, 2025 at 9:32 AM Tao Chen <chen.dylane@...ux.dev> wrote:
>>
>> The stacktrace map can be easily full, which will lead to failure in
>> obtaining the stack. In addition to increasing the size of the map,
>> another solution is to delete the stack_id after looking it up from
> 
> This is not a "solution", really. Just another partially broken
> workaround to an already broken STACKMAP map (and there is no fixing
> it, IMO).
>

Actually it is. But in our use case, we used continuous profiling with 
perf_event, the result looks better when we got the stack_id and deleted 
it to alleviate the data size pressure in the map. And there is no high 
requirement for the accuracy of stack information for us.

> When a user is doing lookup_and_delete for that element, another BPF
> program can reuse that slot, and user space will delete the stack that
> is, effectively, still in use.
> 
> <rant>
> 
> I also just looked at the bpf_stackmap_copy() implementation, and even
> lookup behavior appears broken. We temporarily remove the bucket while
> copying, just to put it back a bit later. Meanwhile another BPF
> program can put something into that bucket and we'll just silently
> discard that new value later. That was a new one for me, but whatever,

Yes, it is a problem.

> as I said STACKMAP cannot be used reliably anyways.
> 
> </rant>
> 
> But let's stay constructive here. Some vague proposals below.
> 
> Really, STACKMAP should have used some form of refcounting and let
> users put those refs, instead of just unconditionally removing the
> element. I wonder if we can retrofit this and treat lookup/delete as
> get/put instead? This would work well for a typical use pattern where
> we send stack_id through ringbuf of some sort and user space fetches
> stack trace by that ID. Each bpf_get_stackid() would be treated as
> refcount bump, and each lookup_and_delete or just delete would be
> treated as refcount put.
> 
> Also, it would be better for STACKMAP to be a proper hashmap and
> handle collisions properly.
> 
> The above two changes would probably make STACKMAP actually usable as
> "a stack trace bank" producing 4-byte IDs that are easily added to
> fixed-sized ringbuf samples as an extra field. This model sometimes is
> way more convenient than getting bpf_get_stack() and copying it into
> ringbuf (which is currently the only way to have reliable stack traces
> with BPF, IMO).
> 
> So, tl;dr. Maybe instead of pretending like we are fixing something
> about STACKMAP with slightly-more-atomic (but not really)
> lookup_and_delete support, maybe let's try to actually make STACKMAP
> usable?.. (it's way harder than this patch, but has more value, IMO)
> 

The idea looks great. I will try to make improvements in this direction, 
though there will be certain challenges for me right now.

> What does everyone think?
> 
> P.S. It seems like a good idea to switch STACKMAP to open addressing
> instead of the current kind-of-bucket-chain-but-not-really
> implementation. It's fixed size and pre-allocated already, so open
> addressing seems like a great approach here, IMO.
> 
>> the user, so extend the existing bpf_map_lookup_and_delete_elem()
>> functionality to stacktrace map types.
>>
>> Signed-off-by: Tao Chen <chen.dylane@...ux.dev>
>> ---
>>   include/linux/bpf.h   |  2 +-
>>   kernel/bpf/stackmap.c | 16 ++++++++++++++--
>>   kernel/bpf/syscall.c  |  8 +++++---
>>   3 files changed, 20 insertions(+), 6 deletions(-)
>>
> 
> As for the patch in question, I think the logic is correct :) I find
> bpf_stackmap_copy_and_delete() name bad and misleading, though,
> because it's more of "maybe also delete". Maybe
> bpf_stackmap_extract()? Don't know, it's minor nit anyways.
> 
> [...]Will rename it in v2. The original idea in this patch was just to
make it convenient for users to delete the stackid when they obtain it.

-- 
Best Regards
Tao Chen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ