lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMB2axMQ9iFv4XjH3X3QLozudpga=DPSYEgzt3thtMOjYnrv7g@mail.gmail.com>
Date: Wed, 18 Dec 2024 08:09:23 -0800
From: Amery Hung <ameryhung@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Martin KaFai Lau <martin.lau@...ux.dev>, Amery Hung <amery.hung@...edance.com>, 
	bpf <bpf@...r.kernel.org>, Network Development <netdev@...r.kernel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>, 
	Martin KaFai Lau <martin.lau@...nel.org>, Kui-Feng Lee <sinquersw@...il.com>, 
	Toke Høiland-Jørgensen <toke@...hat.com>, 
	Jamal Hadi Salim <jhs@...atatu.com>, Jiri Pirko <jiri@...nulli.us>, stfomichev@...il.com, 
	ekarani.silvestre@....ufcg.edu.br, yangpeihao@...u.edu.cn, 
	Cong Wang <xiyou.wangcong@...il.com>, Peilin Ye <yepeilin.cs@...il.com>
Subject: Re: [PATCH bpf-next v1 01/13] bpf: Support getting referenced kptr
 from struct_ops argument

On Tue, Dec 17, 2024 at 5:24 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Tue, Dec 17, 2024 at 4:58 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> >
> > On 12/13/24 3:29 PM, Amery Hung wrote:
> > > Allows struct_ops programs to acqurie referenced kptrs from arguments
> > > by directly reading the argument.
> > >
> > > The verifier will acquire a reference for struct_ops a argument tagged
> > > with "__ref" in the stub function in the beginning of the main program.
> > > The user will be able to access the referenced kptr directly by reading
> > > the context as long as it has not been released by the program.
> > >
> > > This new mechanism to acquire referenced kptr (compared to the existing
> > > "kfunc with KF_ACQUIRE") is introduced for ergonomic and semantic reasons.
> > > In the first use case, Qdisc_ops, an skb is passed to .enqueue in the
> > > first argument. This mechanism provides a natural way for users to get a
> > > referenced kptr in the .enqueue struct_ops programs and makes sure that a
> > > qdisc will always enqueue or drop the skb.
> > >
> > > Signed-off-by: Amery Hung <amery.hung@...edance.com>
> > > ---
> > >   include/linux/bpf.h         |  3 +++
> > >   kernel/bpf/bpf_struct_ops.c | 26 ++++++++++++++++++++------
> > >   kernel/bpf/btf.c            |  1 +
> > >   kernel/bpf/verifier.c       | 35 ++++++++++++++++++++++++++++++++---
> > >   4 files changed, 56 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 1b84613b10ac..72bf941d1daf 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -968,6 +968,7 @@ struct bpf_insn_access_aux {
> > >               struct {
> > >                       struct btf *btf;
> > >                       u32 btf_id;
> > > +                     u32 ref_obj_id;
> > >               };
> > >       };
> > >       struct bpf_verifier_log *log; /* for verbose logs */
> > > @@ -1480,6 +1481,8 @@ struct bpf_ctx_arg_aux {
> > >       enum bpf_reg_type reg_type;
> > >       struct btf *btf;
> > >       u32 btf_id;
> > > +     u32 ref_obj_id;
> > > +     bool refcounted;
> > >   };
> > >
> > >   struct btf_mod_pair {
> > > diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
> > > index fda3dd2ee984..6e7795744f6a 100644
> > > --- a/kernel/bpf/bpf_struct_ops.c
> > > +++ b/kernel/bpf/bpf_struct_ops.c
> > > @@ -145,6 +145,7 @@ void bpf_struct_ops_image_free(void *image)
> > >   }
> > >
> > >   #define MAYBE_NULL_SUFFIX "__nullable"
> > > +#define REFCOUNTED_SUFFIX "__ref"
> > >   #define MAX_STUB_NAME 128
> > >
> > >   /* Return the type info of a stub function, if it exists.
> > > @@ -206,9 +207,11 @@ static int prepare_arg_info(struct btf *btf,
> > >                           struct bpf_struct_ops_arg_info *arg_info)
> > >   {
> > >       const struct btf_type *stub_func_proto, *pointed_type;
> > > +     bool is_nullable = false, is_refcounted = false;
> > >       const struct btf_param *stub_args, *args;
> > >       struct bpf_ctx_arg_aux *info, *info_buf;
> > >       u32 nargs, arg_no, info_cnt = 0;
> > > +     const char *suffix;
> > >       u32 arg_btf_id;
> > >       int offset;
> > >
> > > @@ -240,12 +243,19 @@ static int prepare_arg_info(struct btf *btf,
> > >       info = info_buf;
> > >       for (arg_no = 0; arg_no < nargs; arg_no++) {
> > >               /* Skip arguments that is not suffixed with
> > > -              * "__nullable".
> > > +              * "__nullable or __ref".
> > >                */
> > > -             if (!btf_param_match_suffix(btf, &stub_args[arg_no],
> > > -                                         MAYBE_NULL_SUFFIX))
> > > +             is_nullable = btf_param_match_suffix(btf, &stub_args[arg_no],
> > > +                                                  MAYBE_NULL_SUFFIX);
> > > +             is_refcounted = btf_param_match_suffix(btf, &stub_args[arg_no],
> > > +                                                    REFCOUNTED_SUFFIX);
> > > +             if (!is_nullable && !is_refcounted)
> > >                       continue;
> > >
> > > +             if (is_nullable)
> > > +                     suffix = MAYBE_NULL_SUFFIX;
> > > +             else if (is_refcounted)
> > > +                     suffix = REFCOUNTED_SUFFIX;
> > >               /* Should be a pointer to struct */
> > >               pointed_type = btf_type_resolve_ptr(btf,
> > >                                                   args[arg_no].type,
> > > @@ -253,7 +263,7 @@ static int prepare_arg_info(struct btf *btf,
> > >               if (!pointed_type ||
> > >                   !btf_type_is_struct(pointed_type)) {
> > >                       pr_warn("stub function %s__%s has %s tagging to an unsupported type\n",
> > > -                             st_ops_name, member_name, MAYBE_NULL_SUFFIX);
> > > +                             st_ops_name, member_name, suffix);
> > >                       goto err_out;
> > >               }
> > >
> > > @@ -271,11 +281,15 @@ static int prepare_arg_info(struct btf *btf,
> > >               }
> > >
> > >               /* Fill the information of the new argument */
> > > -             info->reg_type =
> > > -                     PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL;
> > >               info->btf_id = arg_btf_id;
> > >               info->btf = btf;
> > >               info->offset = offset;
> > > +             if (is_nullable) {
> > > +                     info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL;
> > > +             } else if (is_refcounted) {
> > > +                     info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID;
> > > +                     info->refcounted = true;
> > > +             }
> > >
> > >               info++;
> > >               info_cnt++;
> > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > index e7a59e6462a9..a05ccf9ee032 100644
> > > --- a/kernel/bpf/btf.c
> > > +++ b/kernel/bpf/btf.c
> > > @@ -6580,6 +6580,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > >                       info->reg_type = ctx_arg_info->reg_type;
> > >                       info->btf = ctx_arg_info->btf ? : btf_vmlinux;
> > >                       info->btf_id = ctx_arg_info->btf_id;
> > > +                     info->ref_obj_id = ctx_arg_info->ref_obj_id;
> > >                       return true;
> > >               }
> > >       }
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 9f5de8d4fbd0..69753096075f 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -1402,6 +1402,17 @@ static int release_reference_state(struct bpf_func_state *state, int ptr_id)
> > >       return -EINVAL;
> > >   }
> > >
> > > +static bool find_reference_state(struct bpf_func_state *state, int ptr_id)
> > > +{
> > > +     int i;
> > > +
> > > +     for (i = 0; i < state->acquired_refs; i++)
> > > +             if (state->refs[i].id == ptr_id)
> > > +                     return true;
> > > +
> > > +     return false;
> > > +}
> > > +
> > >   static int release_lock_state(struct bpf_func_state *state, int type, int id, void *ptr)
> > >   {
> > >       int i, last_idx;
> > > @@ -5798,7 +5809,8 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
> > >   /* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
> > >   static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
> > >                           enum bpf_access_type t, enum bpf_reg_type *reg_type,
> > > -                         struct btf **btf, u32 *btf_id, bool *is_retval, bool is_ldsx)
> > > +                         struct btf **btf, u32 *btf_id, bool *is_retval, bool is_ldsx,
> > > +                         u32 *ref_obj_id)
> > >   {
> > >       struct bpf_insn_access_aux info = {
> > >               .reg_type = *reg_type,
> > > @@ -5820,8 +5832,16 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
> > >               *is_retval = info.is_retval;
> > >
> > >               if (base_type(*reg_type) == PTR_TO_BTF_ID) {
> > > +                     if (info.ref_obj_id &&
> > > +                         !find_reference_state(cur_func(env), info.ref_obj_id)) {
> > > +                             verbose(env, "invalid bpf_context access off=%d. Reference may already be released\n",
> > > +                                     off);
> > > +                             return -EACCES;
> > > +                     }
> > > +
> > >                       *btf = info.btf;
> > >                       *btf_id = info.btf_id;
> > > +                     *ref_obj_id = info.ref_obj_id;
> > >               } else {
> > >                       env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
> > >               }
> > > @@ -7135,7 +7155,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> > >               struct bpf_retval_range range;
> > >               enum bpf_reg_type reg_type = SCALAR_VALUE;
> > >               struct btf *btf = NULL;
> > > -             u32 btf_id = 0;
> > > +             u32 btf_id = 0, ref_obj_id = 0;
> > >
> > >               if (t == BPF_WRITE && value_regno >= 0 &&
> > >                   is_pointer_value(env, value_regno)) {
> > > @@ -7148,7 +7168,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> > >                       return err;
> > >
> > >               err = check_ctx_access(env, insn_idx, off, size, t, &reg_type, &btf,
> > > -                                    &btf_id, &is_retval, is_ldsx);
> > > +                                    &btf_id, &is_retval, is_ldsx, &ref_obj_id);
> > >               if (err)
> > >                       verbose_linfo(env, insn_idx, "; ");
> > >               if (!err && t == BPF_READ && value_regno >= 0) {
> > > @@ -7179,6 +7199,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> > >                               if (base_type(reg_type) == PTR_TO_BTF_ID) {
> > >                                       regs[value_regno].btf = btf;
> > >                                       regs[value_regno].btf_id = btf_id;
> > > +                                     regs[value_regno].ref_obj_id = ref_obj_id;
> > >                               }
> > >                       }
> > >                       regs[value_regno].type = reg_type;
> > > @@ -21662,6 +21683,7 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog)
> > >   {
> > >       bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
> > >       struct bpf_subprog_info *sub = subprog_info(env, subprog);
> > > +     struct bpf_ctx_arg_aux *ctx_arg_info;
> > >       struct bpf_verifier_state *state;
> > >       struct bpf_reg_state *regs;
> > >       int ret, i;
> > > @@ -21769,6 +21791,13 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog)
> > >               mark_reg_known_zero(env, regs, BPF_REG_1);
> > >       }
> > >
> > > +     if (!subprog && env->prog->type == BPF_PROG_TYPE_STRUCT_OPS) {
> > > +             ctx_arg_info = (struct bpf_ctx_arg_aux *)env->prog->aux->ctx_arg_info;
> > > +             for (i = 0; i < env->prog->aux->ctx_arg_info_size; i++)
> > > +                     if (ctx_arg_info[i].refcounted)
> > > +                             ctx_arg_info[i].ref_obj_id = acquire_reference_state(env, 0);
> >
> > There is a conflict in the bpf-next/master. acquire_reference_state has been
> > refactored in commit 769b0f1c8214. From looking at the net/sched/sch_*.c
> > changes, they should not have conflict with the net-next/main. I would suggest
> > to rebase this set on bpf-next/master.
> >
> > At the first glance, the ref_obj_id assignment looks racy because ctx_arg_info
> > is shared by different bpf progs that may be verified in parallel. After another
> > thought, this should be fine because it should always end up having the same
> > ref_obj_id for the same arg-no, right? Not sure if UBSAN can understand this
> > without using the READ/WRITE_ONCE. but adding READ/WRITE_ONCE when using
> > ref_obj_id will be quite puzzling when reading the verifier code. Any better idea?
>
> ctx_arg_info is kinda read-only from the verifier pov.
> bpf_ctx_arg_aux->btf_id is populated before the main verifier loop.
> While ref_obj_id is a dynamic property.
> It doesn't really fit in bpf_ctx_arg_aux.
> It probably needs to be another struct type that is allocated
> and populated once with acquire_reference() when the main verifier loop
> is happening.
> do_check_common() maybe too early?
> Looks like it's anyway a reference that is ok to leak per patch 3 ?
>
> It seems the main goal is to pass ref_obj_id-like argument into bpf prog
> and make sure that prog doesn't call KF_RELEASE kfunc on it twice,
> but leaking is ok?
> Maybe it needs a different type. Other than REF_TYPE_PTR.
>

The main goal of this patch is to get a unique ref_obj_id to the skb
arg in a .enqueue call. Therefore, we acquire that one and only
ref_obj_id for __ref arg early in do_check_common() and do not change
it afterward. Later in the main loop, the liviness is tracked in the
reference states. This feels kind of read-only? Besides, since we
acquire the ref automatically, it forces the user to do something with
the ref ptr (in qdisc's case, .enqueue needs to either drop or enqueue
it).

I try to break down the requirements from bpf qdisc (1. only a unique
reference to the skb in .enqueue; 2. users must enqueue or drop the
skb in .enqueue; 3. dequeue a single skb) into two orthogonal patches
1 and 3. so whether this reference can leak or not can be independent.
Taking a step back, maybe we can encapsulate them all in one semantic
(a new kind of REF like you suggest), but I am not sure if that'd be
too specific and then less useful to others.

> > Other than the subprog, afaik, the bpf prog triggered by the bpf_tail_call can
> > also take the 'u64 *ctx' array. May be disallow using tailcall in all ops in the
> > bpf qdisc. env->subprog_info[i].has_tail_call has already tracked whether the
> > tail_call is used.
>
> +1. Just disallow tail_call.

Got it. Also, thanks Martin for pointing it out.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ