linux-kernel - Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f9cae15-979c-c049-78a9-f89d5cd1b53e@bytedance.com>
Date:   Thu, 14 Sep 2023 16:56:31 +0800
From:   Chuyi Zhou <zhouchuyi@...edance.com>
To:     bpf@...r.kernel.org,
        Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
        martin.lau@...nel.org, tj@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce
 css_iter and process_iter in RCU CS



在 2023/9/13 21:53, Chuyi Zhou 写道:
> Hello.
> 
> 在 2023/9/12 15:01, Chuyi Zhou 写道:
>> css_iter and process_iter should be used in rcu section. Specifically, in
>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
>> use them directly.
>>
>> This patch checks whether we are in rcu cs before we want to invoke
>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
>> reject if reg->type is UNTRUSTED.
> 
> I use the following BPF Prog to test this patch:
> 
> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
> int iter_task_for_each_sleep(void *ctx)
> {
>      struct task_struct *task;
>      struct task_struct *cur_task = bpf_get_current_task_btf();
> 
>      if (cur_task->pid != target_pid)
>          return 0;
>      bpf_rcu_read_lock();
>      bpf_for_each(process, task) {
>          bpf_rcu_read_unlock();
>          if (task->pid == target_pid)
>              process_cnt += 1;
>          bpf_rcu_read_lock();
>      }
>      bpf_rcu_read_unlock();
>      return 0;
> }
> 
> Unfortunately, we can pass the verifier.
> 
> Then I add some printk-messages before setting/clearing state to help 
> debug:
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d151e6b43a5f..35f3fa9471a9 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct 
> bpf_verifier_env *env,
>                  __mark_reg_known_zero(st);
>                  st->type = PTR_TO_STACK; /* we don't have dedicated reg 
> type */
>                  if (is_iter_need_rcu(meta)) {
> +                       printk("mark reg_addr : %px", st);
>                          if (in_rcu_cs(env))
>                                  st->type |= MEM_RCU;
>                          else
> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct 
> bpf_verifier_env *env, struct bpf_insn *insn,
>                          return -EINVAL;
>                  } else if (rcu_unlock) {
>                          bpf_for_each_reg_in_vstate(env->cur_state, 
> state, reg, ({
> +                               printk("clear reg_addr : %px MEM_RCU : 
> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type & 
> PTR_UNTRUSTED);
>                                  if (reg->type & MEM_RCU) {
> -                                       printk("clear reg addr : %lld", 
> reg);
>                                          reg->type &= ~(MEM_RCU | 
> PTR_MAYBE_NULL);
>                                          reg->type |= PTR_UNTRUSTED;
>                                  }
> 
> 
> The demsg log:
> 
> [  393.705324] mark reg_addr : ffff88814e40e200
> 
> [  393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0 
> PTR_UNTRUSTED : 0
> 
> [  393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0 
> PTR_UNTRUSTED : 0
> 
> [  393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0 
> PTR_UNTRUSTED : 0
> ....
> ....
> 
> I didn't see ffff88814e40e200 is cleared as expected because 
> bpf_for_each_reg_in_vstate didn't find it.
> 
> It seems when we are doing bpf_read_unlock() in the middle of iteration 
> and want to clearing state through bpf_for_each_reg_in_vstate, we can 
> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in 
> mark_stack_slots_iter().
> 

bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in 
mark_stack_slots_iter() we has marked the slots *STACK_ITER*

With the following change, everything seems work OK.

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index a3236651ec64..83c5ecccadb4 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -387,7 +387,7 @@ struct bpf_verifier_state {

  #define bpf_get_spilled_reg(slot, frame)                               \
         (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
-         (frame->stack[slot].slot_type[0] == STACK_SPILL))             \
+         (frame->stack[slot].slot_type[0] == STACK_SPILL || 
frame->stack[slot].slot_type[0] == STACK_ITER))            \
          ? &frame->stack[slot].spilled_ptr : NULL)

I am not sure whether this would harm some logic implicitly when using 
bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so, 
maybe we should add a extra parameter to control the picking behaviour.

#define bpf_get_spilled_reg(slot, frame, stack_type)
			\
	(((slot < frame->allocated_stack / BPF_REG_SIZE) &&		\
	  (frame->stack[slot].slot_type[0] == stack_type))		\
	 ? &frame->stack[slot].spilled_ptr : NULL)

Thanks.