linux-kernel - Re: [RFC PATCH bpf-next 4/6] bpf: Add bpf runtime hooks for tracking runtime acquire/release

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+TzLc=Z_Rp-UC6s9gg5hB1byd_w7oT807z44NuKC_TxA@mail.gmail.com>
Date: Tue, 25 Feb 2025 17:57:47 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Juntong Deng <juntong.deng@...look.com>
Cc: Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	John Fastabend <john.fastabend@...il.com>, Andrii Nakryiko <andrii@...nel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>, 
	Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	Kumar Kartikeya Dwivedi <memxor@...il.com>, snorcht@...il.com, bpf <bpf@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 4/6] bpf: Add bpf runtime hooks for tracking
 runtime acquire/release

On Tue, Feb 25, 2025 at 3:34 PM Juntong Deng <juntong.deng@...look.com> wrote:
>
> On 2025/2/25 22:12, Alexei Starovoitov wrote:
> > On Thu, Feb 13, 2025 at 4:36 PM Juntong Deng <juntong.deng@...look.com> wrote:
> >>
> >> +void *bpf_runtime_acquire_hook(void *arg1, void *arg2, void *arg3,
> >> +                              void *arg4, void *arg5, void *arg6 /* kfunc addr */)
> >> +{
> >> +       struct btf_struct_kfunc *struct_kfunc, dummy_key;
> >> +       struct btf_struct_kfunc_tab *tab;
> >> +       struct bpf_run_ctx *bpf_ctx;
> >> +       struct bpf_ref_node *node;
> >> +       bpf_kfunc_t kfunc;
> >> +       struct btf *btf;
> >> +       void *kfunc_ret;
> >> +
> >> +       kfunc = (bpf_kfunc_t)arg6;
> >> +       kfunc_ret = kfunc(arg1, arg2, arg3, arg4, arg5);
> >> +
> >> +       if (!kfunc_ret)
> >> +               return kfunc_ret;
> >> +
> >> +       bpf_ctx = current->bpf_ctx;
> >> +       btf = bpf_get_btf_vmlinux();
> >> +
> >> +       tab = btf->acquire_kfunc_tab;
> >> +       if (!tab)
> >> +               return kfunc_ret;
> >> +
> >> +       dummy_key.kfunc_addr = (unsigned long)arg6;
> >> +       struct_kfunc = bsearch(&dummy_key, tab->set, tab->cnt,
> >> +                              sizeof(struct btf_struct_kfunc),
> >> +                              btf_kfunc_addr_cmp_func);
> >> +
> >> +       node = list_first_entry(&bpf_ctx->free_ref_list, struct bpf_ref_node, lnode);
> >> +       node->obj_addr = (unsigned long)kfunc_ret;
> >> +       node->struct_btf_id = struct_kfunc->struct_btf_id;
> >> +
> >> +       list_del(&node->lnode);
> >> +       hash_add(bpf_ctx->active_ref_list, &node->hnode, node->obj_addr);
> >> +
> >> +       pr_info("bpf prog acquire obj addr = %lx, btf id = %d\n",
> >> +               node->obj_addr, node->struct_btf_id);
> >> +       print_bpf_active_refs();
> >> +
> >> +       return kfunc_ret;
> >> +}
> >> +
> >> +void bpf_runtime_release_hook(void *arg1, void *arg2, void *arg3,
> >> +                             void *arg4, void *arg5, void *arg6 /* kfunc addr */)
> >> +{
> >> +       struct bpf_run_ctx *bpf_ctx;
> >> +       struct bpf_ref_node *node;
> >> +       bpf_kfunc_t kfunc;
> >> +
> >> +       kfunc = (bpf_kfunc_t)arg6;
> >> +       kfunc(arg1, arg2, arg3, arg4, arg5);
> >> +
> >> +       bpf_ctx = current->bpf_ctx;
> >> +
> >> +       hash_for_each_possible(bpf_ctx->active_ref_list, node, hnode, (unsigned long)arg1) {
> >> +               if (node->obj_addr == (unsigned long)arg1) {
> >> +                       hash_del(&node->hnode);
> >> +                       list_add(&node->lnode, &bpf_ctx->free_ref_list);
> >> +
> >> +                       pr_info("bpf prog release obj addr = %lx, btf id = %d\n",
> >> +                               node->obj_addr, node->struct_btf_id);
> >> +               }
> >> +       }
> >> +
> >> +       print_bpf_active_refs();
> >> +}
> >
> > So for every acq/rel the above two function will be called
> > and you call this:
> > "
> > perhaps we can use some low overhead runtime solution first as a
> > not too bad alternative
> > "
> >
> > low overhead ?!
> >
> > acq/rel kfuncs can be very hot.
> > To the level that single atomic_inc() is a noticeable overhead.
> > Doing above is an obvious no-go in any production setup.
> >
> >> Before the bpf program actually runs, we can allocate the maximum
> >> possible number of reference nodes to record reference information.
> >
> > This is an incorrect assumption.
> > Look at register_btf_id_dtor_kfuncs()
> > that patch 1 is sort-of trying to reinvent.
> > Acquired objects can be stashed with single xchg instruction and
> > people are not happy with performance either.
> > An acquire kfunc plus inlined bpf_kptr_xchg is too slow in some cases.
> > A bunch of bpf progs operate under constraints where nanoseconds count.
> > That's why we rely on static verification where possible.
> > Everytime we introduce run-time safety checks (like bpf_arena) we
> > sacrifice some use cases.
> > So, no, this proposal is not a solution.
>
> OK, I agree, if single atomic_inc() is a noticeable overhead, then any
> runtime solution is not applicable.
>
> (I had thought about using btf id as another argument to further
> eliminate the O(log n) overhead of binary search, but now it is
> obviously not enough)
>
>
> I am not sure, BPF runtime hooks seem to be a general feature, maybe it
> can be used in other scenarios?
>
> Do you think there would be value in non-intrusive bpf program behavior
> tracking if it is only enabled in certain modes?
>
> For example, to help us debug/diagnose bpf programs.

Unlikely. In general we don't add debug code to the kernel.
There are exceptions like lockdep and kasan that are there to
debug the kernel itself and they're not enabled in prod.
I don't see a use case for "let's replace all kfuncs with a wrapper".
perf record/report works on bpf progs, so profiling/perf analysis
part of bpf prog is solved.
Debugging of bpf prog for correctness is a different story.
It's an interesting challenge on its own, but
wrapping kfuncs won't help.
Debugging bpf prog is not much different than debugging normal kernel code.
Sprinkle printk is typically the answer.