netdev - Re: [PATCH bpf-next v5 3/7] bpf: Pin associated struct

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMB2axPayfZOZnGK83eWxYTg9k0uno_y87_0ePE_FD6V+4tnfA@mail.gmail.com>
Date: Thu, 6 Nov 2025 09:57:31 -0800
From: Amery Hung <ameryhung@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: netdev@...r.kernel.org, alexei.starovoitov@...il.com, andrii@...nel.org, 
	daniel@...earbox.net, tj@...nel.org, martin.lau@...nel.org, 
	kernel-team@...a.com, bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next v5 3/7] bpf: Pin associated struct_ops when
 registering async callback

On Wed, Nov 5, 2025 at 6:13 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 11/4/25 9:26 AM, Amery Hung wrote:
> > Take a refcount of the associated struct_ops map to prevent the map from
> > being freed when an async callback scheduled from a struct_ops program
> > runs.
> >
> > Since struct_ops programs do not take refcounts on the struct_ops map,
> > it is possible for a struct_ops map to be freed when an async callback
> > scheduled from it runs. To prevent this, take a refcount on prog->aux->
> > st_ops_assoc and save it in a newly created struct bpf_async_res for
> > every async mechanism. The reference needs to be preserved in
> > bpf_async_res since prog->aux->st_ops_assoc can be poisoned anytime
> > and reference leak could happen.
> >
> > bpf_async_res will contain a async callback's BPF program and resources
> > related to the BPF program. The resources will be acquired when
> > registering a callback and released when cancelled or when the map
> > associated with the callback is freed.
> >
> > Also rename drop_prog_refcnt to bpf_async_cb_reset to better reflect
> > what it now does.
> >
>
> [ ... ]
>
> > +static int bpf_async_res_get(struct bpf_async_res *res, struct bpf_prog *prog)
> > +{
> > +     struct bpf_map *st_ops_assoc = NULL;
> > +     int err;
> > +
> > +     prog = bpf_prog_inc_not_zero(prog);
> > +     if (IS_ERR(prog))
> > +             return PTR_ERR(prog);
> > +
> > +     st_ops_assoc = READ_ONCE(prog->aux->st_ops_assoc);
> > +     if (prog->type == BPF_PROG_TYPE_STRUCT_OPS &&
> > +         st_ops_assoc && st_ops_assoc != BPF_PTR_POISON) {
> > +             st_ops_assoc = bpf_map_inc_not_zero(st_ops_assoc);
>
> The READ_ONCE and inc_not_zero is an unusual combo. Should it be
> rcu_dereference and prog->aux->st_ops_assoc should be "__rcu" tagged?
>

Understood the underlying struct_ops map is protected by RCU, but
prog->aux->st_ops_assoc is not protected by RCU and can change
anytime.

> If prog->aux->st_ops_assoc is protected by rcu, can the user (kfunc?)
> uses the prog->aux->st_ops_assoc depending on the rcu grace period alone
> without bpf_map_inc_not_zero? Does it matter if prog->aux->st_ops_assoc
> is changed? but this patch does not seem to consider the changing case also.
>

I think bumping refcount makes bpf_prog_get_assoc_struct_ops() easier
to use: Users do not need to worry about the lifetime of the return
kdata, RCU, and the execution context.

The main problem is that async callbacks are not running in RCU
read-side critical section, so it will require callers of
bpf_prog_get_assoc_struct_ops() to do rcu_read_lock{_trace}().

The change of st_ops_assoc indeed is missed here. st_ops_assoc can
change from NULL to a valid kdata. Will fix this in the next respin.