linux-kernel - Re: [RFC PATCH bpf-next 4/6] bpf: Add bpf runtime hooks for tracking runtime acquire/release

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <AM6PR03MB508026B637117BD9E13C2F9299CD2@AM6PR03MB5080.eurprd03.prod.outlook.com>
Date: Thu, 27 Feb 2025 21:55:00 +0000
From: Juntong Deng <juntong.deng@...look.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>,
 John Fastabend <john.fastabend@...il.com>,
 Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
 <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>,
 Song Liu <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>,
 KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>,
 Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
 Kumar Kartikeya Dwivedi <memxor@...il.com>, snorcht@...il.com,
 bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 4/6] bpf: Add bpf runtime hooks for tracking
 runtime acquire/release

On 2025/2/26 01:57, Alexei Starovoitov wrote:
> On Tue, Feb 25, 2025 at 3:34 PM Juntong Deng <juntong.deng@...look.com> wrote:
>>
>> On 2025/2/25 22:12, Alexei Starovoitov wrote:
>>> On Thu, Feb 13, 2025 at 4:36 PM Juntong Deng <juntong.deng@...look.com> wrote:
>>>>
>>>> +void *bpf_runtime_acquire_hook(void *arg1, void *arg2, void *arg3,
>>>> +                              void *arg4, void *arg5, void *arg6 /* kfunc addr */)
>>>> +{
>>>> +       struct btf_struct_kfunc *struct_kfunc, dummy_key;
>>>> +       struct btf_struct_kfunc_tab *tab;
>>>> +       struct bpf_run_ctx *bpf_ctx;
>>>> +       struct bpf_ref_node *node;
>>>> +       bpf_kfunc_t kfunc;
>>>> +       struct btf *btf;
>>>> +       void *kfunc_ret;
>>>> +
>>>> +       kfunc = (bpf_kfunc_t)arg6;
>>>> +       kfunc_ret = kfunc(arg1, arg2, arg3, arg4, arg5);
>>>> +
>>>> +       if (!kfunc_ret)
>>>> +               return kfunc_ret;
>>>> +
>>>> +       bpf_ctx = current->bpf_ctx;
>>>> +       btf = bpf_get_btf_vmlinux();
>>>> +
>>>> +       tab = btf->acquire_kfunc_tab;
>>>> +       if (!tab)
>>>> +               return kfunc_ret;
>>>> +
>>>> +       dummy_key.kfunc_addr = (unsigned long)arg6;
>>>> +       struct_kfunc = bsearch(&dummy_key, tab->set, tab->cnt,
>>>> +                              sizeof(struct btf_struct_kfunc),
>>>> +                              btf_kfunc_addr_cmp_func);
>>>> +
>>>> +       node = list_first_entry(&bpf_ctx->free_ref_list, struct bpf_ref_node, lnode);
>>>> +       node->obj_addr = (unsigned long)kfunc_ret;
>>>> +       node->struct_btf_id = struct_kfunc->struct_btf_id;
>>>> +
>>>> +       list_del(&node->lnode);
>>>> +       hash_add(bpf_ctx->active_ref_list, &node->hnode, node->obj_addr);
>>>> +
>>>> +       pr_info("bpf prog acquire obj addr = %lx, btf id = %d\n",
>>>> +               node->obj_addr, node->struct_btf_id);
>>>> +       print_bpf_active_refs();
>>>> +
>>>> +       return kfunc_ret;
>>>> +}
>>>> +
>>>> +void bpf_runtime_release_hook(void *arg1, void *arg2, void *arg3,
>>>> +                             void *arg4, void *arg5, void *arg6 /* kfunc addr */)
>>>> +{
>>>> +       struct bpf_run_ctx *bpf_ctx;
>>>> +       struct bpf_ref_node *node;
>>>> +       bpf_kfunc_t kfunc;
>>>> +
>>>> +       kfunc = (bpf_kfunc_t)arg6;
>>>> +       kfunc(arg1, arg2, arg3, arg4, arg5);
>>>> +
>>>> +       bpf_ctx = current->bpf_ctx;
>>>> +
>>>> +       hash_for_each_possible(bpf_ctx->active_ref_list, node, hnode, (unsigned long)arg1) {
>>>> +               if (node->obj_addr == (unsigned long)arg1) {
>>>> +                       hash_del(&node->hnode);
>>>> +                       list_add(&node->lnode, &bpf_ctx->free_ref_list);
>>>> +
>>>> +                       pr_info("bpf prog release obj addr = %lx, btf id = %d\n",
>>>> +                               node->obj_addr, node->struct_btf_id);
>>>> +               }
>>>> +       }
>>>> +
>>>> +       print_bpf_active_refs();
>>>> +}
>>>
>>> So for every acq/rel the above two function will be called
>>> and you call this:
>>> "
>>> perhaps we can use some low overhead runtime solution first as a
>>> not too bad alternative
>>> "
>>>
>>> low overhead ?!
>>>
>>> acq/rel kfuncs can be very hot.
>>> To the level that single atomic_inc() is a noticeable overhead.
>>> Doing above is an obvious no-go in any production setup.
>>>
>>>> Before the bpf program actually runs, we can allocate the maximum
>>>> possible number of reference nodes to record reference information.
>>>
>>> This is an incorrect assumption.
>>> Look at register_btf_id_dtor_kfuncs()
>>> that patch 1 is sort-of trying to reinvent.
>>> Acquired objects can be stashed with single xchg instruction and
>>> people are not happy with performance either.
>>> An acquire kfunc plus inlined bpf_kptr_xchg is too slow in some cases.
>>> A bunch of bpf progs operate under constraints where nanoseconds count.
>>> That's why we rely on static verification where possible.
>>> Everytime we introduce run-time safety checks (like bpf_arena) we
>>> sacrifice some use cases.
>>> So, no, this proposal is not a solution.
>>
>> OK, I agree, if single atomic_inc() is a noticeable overhead, then any
>> runtime solution is not applicable.
>>
>> (I had thought about using btf id as another argument to further
>> eliminate the O(log n) overhead of binary search, but now it is
>> obviously not enough)
>>
>>
>> I am not sure, BPF runtime hooks seem to be a general feature, maybe it
>> can be used in other scenarios?
>>
>> Do you think there would be value in non-intrusive bpf program behavior
>> tracking if it is only enabled in certain modes?
>>
>> For example, to help us debug/diagnose bpf programs.
> 
> Unlikely. In general we don't add debug code to the kernel.
> There are exceptions like lockdep and kasan that are there to
> debug the kernel itself and they're not enabled in prod.
> I don't see a use case for "let's replace all kfuncs with a wrapper".
> perf record/report works on bpf progs, so profiling/perf analysis
> part of bpf prog is solved.
> Debugging of bpf prog for correctness is a different story.
> It's an interesting challenge on its own, but
> wrapping kfuncs won't help.
> Debugging bpf prog is not much different than debugging normal kernel code.
> Sprinkle printk is typically the answer.

I have an idea, though not sure if it is helpful.

(This idea is for the previous problem of holding references too long)

My idea is to add a new KF_FLAG, like KF_ACQUIRE_EPHEMERAL, as a
special reference that can only be held for a short time.

When a bpf program holds such a reference, the bpf program will not be
allowed to enter any new logic with uncertain runtime, such as bpf_loop
and the bpf open coded iterator.

(If the bpf program is already in a loop, then no problem, as long as
the bpf program doesn't enter a new nested loop, since the bpf verifier
guarantees that references must be released in the loop body)

In addition, such references can only be acquired and released between a
limited number of instructions, e.g., 300 instructions.

This way the bpf verifier can force bpf programs to hold such references
only for a short time to do simple get-or-set information operations.

This special reference might solve the problem Christian mentioned
in the file iterator that bpf programs might hold file references for
too long.

Do you think this is a feasible solution?