[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP01T76VpVsKabq0cMjX07oDW2WTrQ2dW9NJfB0tvoXFEV5ZoQ@mail.gmail.com>
Date: Thu, 13 Feb 2025 07:41:53 +0100
From: Kumar Kartikeya Dwivedi <memxor@...il.com>
To: Eduard Zingerman <eddyz87@...il.com>
Cc: bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>, Peter Zijlstra <peterz@...radead.org>,
Will Deacon <will@...nel.org>, Waiman Long <llong@...hat.com>, Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Martin KaFai Lau <martin.lau@...nel.org>, "Paul E. McKenney" <paulmck@...nel.org>, Tejun Heo <tj@...nel.org>,
Barret Rhoden <brho@...gle.com>, Josh Don <joshdon@...gle.com>, Dohyun Kim <dohyunkim@...gle.com>,
linux-arm-kernel@...ts.infradead.org, kernel-team@...a.com
Subject: Re: [PATCH bpf-next v2 24/26] bpf: Implement verifier support for rqspinlock
On Wed, 12 Feb 2025 at 01:08, Eduard Zingerman <eddyz87@...il.com> wrote:
>
> On Thu, 2025-02-06 at 02:54 -0800, Kumar Kartikeya Dwivedi wrote:
> > Introduce verifier-side support for rqspinlock kfuncs. The first step is
> > allowing bpf_res_spin_lock type to be defined in map values and
> > allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK
> > field to recognize and validate.
> >
> > Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only
> > one of them (and at most one of them per-object, like before) must be
> > present. The bpf_res_spin_lock can also be used to protect objects that
> > require lock protection for their kfuncs, like BPF rbtree and linked
> > list.
> >
> > The verifier plumbing to simulate success and failure cases when calling
> > the kfuncs is done by pushing a new verifier state to the verifier state
> > stack which will verify the failure case upon calling the kfunc. The
> > path where success is indicated creates all lock reference state and IRQ
> > state (if necessary for irqsave variants). In the case of failure, the
> > state clears the registers r0-r5, sets the return value, and skips kfunc
> > processing, proceeding to the next instruction.
> >
> > When marking the return value for success case, the value is marked as
> > 0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program,
> > whenever user checks the return value as 'if (ret)' or 'if (ret < 0)'
> > the verifier never traverses such branches for success cases, and would
> > be aware that the lock is not held in such cases.
> >
> > We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs
> > are invoked. We introduce a kfunc_class state to avoid mixing lock
> > irqrestore kfuncs with IRQ state created by bpf_local_irq_save.
> >
> > With all this infrastructure, these kfuncs become usable in programs
> > while satisfying all safety properties required by the kernel.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@...il.com>
> > ---
>
> Apart from a few nits, I think this patch looks good.
>
> Acked-by: Eduard Zingerman <eddyz87@...il.com>
>
Thanks!
> [...]
>
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 32c23f2a3086..ed444e44f524 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -115,6 +115,15 @@ struct bpf_reg_state {
> > int depth:30;
> > } iter;
> >
> > + /* For irq stack slots */
> > + struct {
> > + enum {
> > + IRQ_KFUNC_IGNORE,
>
> Is this state ever used?
> mark_stack_slot_irq_flag() is always called with either NATIVE or LOCK.
Hm, no, it was just the default / invalid value, I guess it can be dropped.
>
> > + IRQ_NATIVE_KFUNC,
> > + IRQ_LOCK_KFUNC,
> > + } kfunc_class;
> > + } irq;
> > +
> > /* Max size from any of the above. */
> > struct {
> > unsigned long raw1;
>
> [...]
>
> > @@ -8038,36 +8059,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
> > }
> >
> > rec = reg_btf_record(reg);
> > - if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) {
> > - verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local",
> > - map ? map->name : "kptr");
> > + if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) {
> > + verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local",
> > + map ? map->name : "kptr", lock_str);
> > return -EINVAL;
> > }
> > - if (rec->spin_lock_off != val + reg->off) {
> > - verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
> > - val + reg->off, rec->spin_lock_off);
> > + spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off;
> > + if (spin_lock_off != val + reg->off) {
> > + verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n",
> > + val + reg->off, lock_str, spin_lock_off);
> > return -EINVAL;
> > }
> > if (is_lock) {
> > void *ptr;
> > + int type;
> >
> > if (map)
> > ptr = map;
> > else
> > ptr = btf;
> >
> > - if (cur->active_locks) {
> > - verbose(env,
> > - "Locking two bpf_spin_locks are not allowed\n");
> > - return -EINVAL;
> > + if (!is_res_lock && cur->active_locks) {
>
> Nit: having '&& cur->active_locks' in this branch but not the one for
> 'is_res_lock' is a bit confusing. As far as I understand this is
> just an optimization, and active_locks check could be done (or dropped)
> in both cases.
Yeah, I can make it consistent by adding the check to both.
>
> > + if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) {
> > + verbose(env,
> > + "Locking two bpf_spin_locks are not allowed\n");
> > + return -EINVAL;
> > + }
> > + } else if (is_res_lock) {
> > + if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK, reg->id, ptr)) {
> > + verbose(env, "Acquiring the same lock again, AA deadlock detected\n");
> > + return -EINVAL;
> > + }
> > }
>
> Nit: there is no branch for find_lock_state(... REF_TYPE_RES_LOCK_IRQ ...),
> this is not a problem, as other checks catch the imbalance in
> number of unlocks or unlock of the same lock, but verifier won't
> report the above "AA deadlock" message for bpf_res_spin_lock_irqsave().
>
Good point, will fix.
> The above two checks make it legal to take resilient lock while
> holding regular lock and vice versa. This is probably ok, can't figure
> out an example when this causes trouble.
Yeah, that shouldn't cause a problem.
>
> > - err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr);
> > +
> > + if (is_res_lock && is_irq)
> > + type = REF_TYPE_RES_LOCK_IRQ;
> > + else if (is_res_lock)
> > + type = REF_TYPE_RES_LOCK;
> > + else
> > + type = REF_TYPE_LOCK;
> > + err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr);
> > if (err < 0) {
> > verbose(env, "Failed to acquire lock state\n");
> > return err;
> > }
> > } else {
> > void *ptr;
> > + int type;
> >
> > if (map)
> > ptr = map;
>
> [...]
>
Powered by blists - more mailing lists