linux-kernel - Re: [PATCH bpf-next v2 24/26] bpf: Implement verifier support for rqspinlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a10a2865242dc217e71de54f75fa4289aefb2fa9.camel@gmail.com>
Date: Tue, 11 Feb 2025 16:08:09 -0800
From: Eduard Zingerman <eddyz87@...il.com>
To: Kumar Kartikeya Dwivedi <memxor@...il.com>, bpf@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, Peter Zijlstra	
 <peterz@...radead.org>, Will Deacon <will@...nel.org>, Waiman Long	
 <llong@...hat.com>, Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko	
 <andrii@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, Martin KaFai
 Lau	 <martin.lau@...nel.org>, "Paul E. McKenney" <paulmck@...nel.org>,
 Tejun Heo	 <tj@...nel.org>, Barret Rhoden <brho@...gle.com>, Josh Don
 <joshdon@...gle.com>,  Dohyun Kim <dohyunkim@...gle.com>,
 linux-arm-kernel@...ts.infradead.org, kernel-team@...a.com
Subject: Re: [PATCH bpf-next v2 24/26] bpf: Implement verifier support for
 rqspinlock

On Thu, 2025-02-06 at 02:54 -0800, Kumar Kartikeya Dwivedi wrote:
> Introduce verifier-side support for rqspinlock kfuncs. The first step is
> allowing bpf_res_spin_lock type to be defined in map values and
> allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK
> field to recognize and validate.
> 
> Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only
> one of them (and at most one of them per-object, like before) must be
> present. The bpf_res_spin_lock can also be used to protect objects that
> require lock protection for their kfuncs, like BPF rbtree and linked
> list.
> 
> The verifier plumbing to simulate success and failure cases when calling
> the kfuncs is done by pushing a new verifier state to the verifier state
> stack which will verify the failure case upon calling the kfunc. The
> path where success is indicated creates all lock reference state and IRQ
> state (if necessary for irqsave variants). In the case of failure, the
> state clears the registers r0-r5, sets the return value, and skips kfunc
> processing, proceeding to the next instruction.
> 
> When marking the return value for success case, the value is marked as
> 0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program,
> whenever user checks the return value as 'if (ret)' or 'if (ret < 0)'
> the verifier never traverses such branches for success cases, and would
> be aware that the lock is not held in such cases.
> 
> We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs
> are invoked. We introduce a kfunc_class state to avoid mixing lock
> irqrestore kfuncs with IRQ state created by bpf_local_irq_save.
> 
> With all this infrastructure, these kfuncs become usable in programs
> while satisfying all safety properties required by the kernel.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@...il.com>
> ---

Apart from a few nits, I think this patch looks good.

Acked-by: Eduard Zingerman <eddyz87@...il.com>

[...]

> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 32c23f2a3086..ed444e44f524 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -115,6 +115,15 @@ struct bpf_reg_state {
>  			int depth:30;
>  		} iter;
>  
> +		/* For irq stack slots */
> +		struct {
> +			enum {
> +				IRQ_KFUNC_IGNORE,

Is this state ever used?
mark_stack_slot_irq_flag() is always called with either NATIVE or LOCK.

> +				IRQ_NATIVE_KFUNC,
> +				IRQ_LOCK_KFUNC,
> +			} kfunc_class;
> +		} irq;
> +
>  		/* Max size from any of the above. */
>  		struct {
>  			unsigned long raw1;

[...]

> @@ -8038,36 +8059,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
>  	}
>  
>  	rec = reg_btf_record(reg);
> -	if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) {
> -		verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local",
> -			map ? map->name : "kptr");
> +	if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) {
> +		verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local",
> +			map ? map->name : "kptr", lock_str);
>  		return -EINVAL;
>  	}
> -	if (rec->spin_lock_off != val + reg->off) {
> -		verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
> -			val + reg->off, rec->spin_lock_off);
> +	spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off;
> +	if (spin_lock_off != val + reg->off) {
> +		verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n",
> +			val + reg->off, lock_str, spin_lock_off);
>  		return -EINVAL;
>  	}
>  	if (is_lock) {
>  		void *ptr;
> +		int type;
>  
>  		if (map)
>  			ptr = map;
>  		else
>  			ptr = btf;
>  
> -		if (cur->active_locks) {
> -			verbose(env,
> -				"Locking two bpf_spin_locks are not allowed\n");
> -			return -EINVAL;
> +		if (!is_res_lock && cur->active_locks) {

Nit: having '&& cur->active_locks' in this branch but not the one for
     'is_res_lock' is a bit confusing. As far as I understand this is
     just an optimization, and active_locks check could be done (or dropped)
     in both cases.

> +			if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) {
> +				verbose(env,
> +					"Locking two bpf_spin_locks are not allowed\n");
> +				return -EINVAL;
> +			}
> +		} else if (is_res_lock) {
> +			if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK, reg->id, ptr)) {
> +				verbose(env, "Acquiring the same lock again, AA deadlock detected\n");
> +				return -EINVAL;
> +			}
>  		}

Nit: there is no branch for find_lock_state(... REF_TYPE_RES_LOCK_IRQ ...),
     this is not a problem, as other checks catch the imbalance in
     number of unlocks or unlock of the same lock, but verifier won't
     report the above "AA deadlock" message for bpf_res_spin_lock_irqsave().

The above two checks make it legal to take resilient lock while
holding regular lock and vice versa. This is probably ok, can't figure
out an example when this causes trouble.

> -		err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr);
> +
> +		if (is_res_lock && is_irq)
> +			type = REF_TYPE_RES_LOCK_IRQ;
> +		else if (is_res_lock)
> +			type = REF_TYPE_RES_LOCK;
> +		else
> +			type = REF_TYPE_LOCK;
> +		err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr);
>  		if (err < 0) {
>  			verbose(env, "Failed to acquire lock state\n");
>  			return err;
>  		}
>  	} else {
>  		void *ptr;
> +		int type;
>  
>  		if (map)
>  			ptr = map;

[...]