[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4BzYnB74djXyb08m7tJE9MGxT-iVOOBsNQO3PFGFDW=vRLA@mail.gmail.com>
Date: Tue, 28 Oct 2025 09:53:26 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Amery Hung <ameryhung@...il.com>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org, alexei.starovoitov@...il.com, 
	andrii@...nel.org, daniel@...earbox.net, tj@...nel.org, martin.lau@...nel.org, 
	kernel-team@...a.com
Subject: Re: [PATCH bpf-next v4 2/6] bpf: Support associating BPF program with struct_ops
On Fri, Oct 24, 2025 at 2:29 PM Amery Hung <ameryhung@...il.com> wrote:
>
> Add a new BPF command BPF_PROG_ASSOC_STRUCT_OPS to allow associating
> a BPF program with a struct_ops map. This command takes a file
> descriptor of a struct_ops map and a BPF program and set
> prog->aux->st_ops_assoc to the kdata of the struct_ops map.
>
> The command does not accept a struct_ops program nor a non-struct_ops
> map. Programs of a struct_ops map is automatically associated with the
> map during map update. If a program is shared between two struct_ops
> maps, prog->aux->st_ops_assoc will be poisoned to indicate that the
> associated struct_ops is ambiguous. The pointer, once poisoned, cannot
> be reset since we have lost track of associated struct_ops. For other
> program types, the associated struct_ops map, once set, cannot be
> changed later. This restriction may be lifted in the future if there is
> a use case.
>
> A kernel helper bpf_prog_get_assoc_struct_ops() can be used to retrieve
> the associated struct_ops pointer. The returned pointer, if not NULL, is
> guaranteed to be valid and point to a fully updated struct_ops struct.
> For struct_ops program reused in multiple struct_ops map, the return
> will be NULL. The call must be paired with bpf_struct_ops_put() once the
> caller is done with the struct_ops.
>
> To make sure the returned pointer to be valid, the command increases the
> refcount of the map for every associated non-struct_ops programs. For
> struct_ops programs, since they do not increase the refcount of
> struct_ops map, bpf_prog_get_assoc_struct_ops() has to bump the refcount
> of the map to prevent a map from being freed while the program runs.
> This can happen if a struct_ops program schedules a time callback that
> runs after the struct_ops map is freed.
>
> struct_ops implementers should note that the struct_ops returned may or
> may not be attached. The struct_ops implementer will be responsible for
> tracking and checking the state of the associated struct_ops map if the
> use case requires an attached struct_ops.
>
> Signed-off-by: Amery Hung <ameryhung@...il.com>
> ---
>  include/linux/bpf.h            | 16 ++++++
>  include/uapi/linux/bpf.h       | 17 ++++++
>  kernel/bpf/bpf_struct_ops.c    | 98 ++++++++++++++++++++++++++++++++++
>  kernel/bpf/core.c              |  3 ++
>  kernel/bpf/syscall.c           | 46 ++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 17 ++++++
>  6 files changed, 197 insertions(+)
>
[...]
> @@ -1394,6 +1414,84 @@ int bpf_struct_ops_link_create(union bpf_attr *attr)
>         return err;
>  }
>
> +int bpf_prog_assoc_struct_ops(struct bpf_prog *prog, struct bpf_map *map)
> +{
> +       struct bpf_map *st_ops_assoc;
> +
> +       guard(mutex)(&prog->aux->st_ops_assoc_mutex);
> +
> +       st_ops_assoc = rcu_access_pointer(prog->aux->st_ops_assoc);
we don't have RCU lock here, can this trigger lockdep warnings due to
rcu_access_pointer() use?
> +
> +       if (st_ops_assoc && st_ops_assoc == map)
> +               return 0;
> +
> +       if (st_ops_assoc) {
> +               if (prog->type != BPF_PROG_TYPE_STRUCT_OPS)
> +                       return -EBUSY;
> +
put st_ops_assoc map (if it's not BPF_PTR_POISON already, of course),
otherwise we are leaking refcount
pw-bot: cr
> +               rcu_assign_pointer(prog->aux->st_ops_assoc, BPF_PTR_POISON);
> +       } else {
> +               if (prog->type != BPF_PROG_TYPE_STRUCT_OPS)
> +                       bpf_map_inc(map);
> +
> +               rcu_assign_pointer(prog->aux->st_ops_assoc, map);
> +       }
> +
> +       return 0;
> +}
> +
> +void bpf_prog_disassoc_struct_ops(struct bpf_prog *prog)
> +{
> +       struct bpf_map *st_ops_assoc;
> +
> +       guard(mutex)(&prog->aux->st_ops_assoc_mutex);
> +
> +       st_ops_assoc = rcu_access_pointer(prog->aux->st_ops_assoc);
> +
> +       if (!st_ops_assoc || st_ops_assoc == BPF_PTR_POISON)
> +               return;
> +
> +       if (prog->type != BPF_PROG_TYPE_STRUCT_OPS)
> +               bpf_map_put(st_ops_assoc);
> +
> +       RCU_INIT_POINTER(prog->aux->st_ops_assoc, NULL);
> +}
> +
> +/*
> + * Get a reference to the struct_ops struct (i.e., kdata) associated with a
> + * program. Must be paired with bpf_struct_ops_put().
> + *
> + * If the returned pointer is not NULL, it must points to a valid and
> + * initialized struct_ops. The struct_ops may or may not be attached.
> + * Kernel struct_ops implementers are responsible for tracking and checking
> + * the state of the struct_ops if the use case requires an attached struct_ops.
> + */
> +void *bpf_prog_get_assoc_struct_ops(const struct bpf_prog_aux *aux)
> +{
> +       struct bpf_struct_ops_map *st_map;
> +       struct bpf_map *map;
> +
> +       scoped_guard(rcu) {
> +               map = rcu_dereference(aux->st_ops_assoc);
> +               if (!map || map == BPF_PTR_POISON)
> +                       return NULL;
> +
> +               map = bpf_map_inc_not_zero(map);
I think this is buggy. When timer callback happens, the map can be
long gone, and its underlying memory reused for something else. So
this bpf_map_inc_not_zero() can crash or just corrupt some memory. RCU
inside this function doesn't do much for us, it happens way too late.
It's also suboptimal that we now require callers of
bpf_prog_get_assoc_struct_ops() to do manual ref put.
Have you considered getting prog->aux->st_ops_assoc ref incremented
when scheduling async callback instead? Then we won't need all this
hackery and caller will just be working with borrowed map reference?
> +               if (IS_ERR(map))
> +                       return NULL;
> +       }
> +
> +       st_map = (struct bpf_struct_ops_map *)map;
> +
> +       if (smp_load_acquire(&st_map->kvalue.common.state) == BPF_STRUCT_OPS_STATE_INIT) {
> +               bpf_map_put(map);
> +               return NULL;
> +       }
> +
> +       return &st_map->kvalue.data;
> +}
> +EXPORT_SYMBOL_GPL(bpf_prog_get_assoc_struct_ops);
> +
>  void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map)
>  {
[...]
Powered by blists - more mailing lists
 
