linux-kernel - Re: [RFC PATCH bpf-next 2/4] bpf: Introduce process open coded iterator kfuncs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQL-ZGV6C7VWdQpX64f0+gokE5MLBO3F2J3WyMoq-_NCPg@mail.gmail.com>
Date:   Wed, 6 Sep 2023 10:17:57 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Chuyi Zhou <zhouchuyi@...edance.com>
Cc:     bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 2/4] bpf: Introduce process open coded
 iterator kfuncs

On Wed, Sep 6, 2023 at 5:38 AM Chuyi Zhou <zhouchuyi@...edance.com> wrote:
>
> Hello, Alexei.
>
> 在 2023/9/6 04:09, Alexei Starovoitov 写道:
> > On Sun, Aug 27, 2023 at 12:21 AM Chuyi Zhou <zhouchuyi@...edance.com> wrote:
> >>
> >> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
> >> creation and manipulation of struct bpf_iter_process in open-coded iterator
> >> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >> iterate all processes in the system.
> >>
> >> Signed-off-by: Chuyi Zhou <zhouchuyi@...edance.com>
> >> ---
> >>   include/uapi/linux/bpf.h       |  4 ++++
> >>   kernel/bpf/helpers.c           |  3 +++
> >>   kernel/bpf/task_iter.c         | 31 +++++++++++++++++++++++++++++++
> >>   tools/include/uapi/linux/bpf.h |  4 ++++
> >>   tools/lib/bpf/bpf_helpers.h    |  5 +++++
> >>   5 files changed, 47 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index 2a6e9b99564b..cfbd527e3733 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -7199,4 +7199,8 @@ struct bpf_iter_css_task {
> >>          __u64 __opaque[1];
> >>   } __attribute__((aligned(8)));
> >>
> >> +struct bpf_iter_process {
> >> +       __u64 __opaque[1];
> >> +} __attribute__((aligned(8)));
> >> +
> >>   #endif /* _UAPI__LINUX_BPF_H__ */
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index cf113ad24837..81a2005edc26 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -2458,6 +2458,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> >>   BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
> >>   BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> >>   BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index b1bdba40b684..a6717a76c1e0 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -862,6 +862,37 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> >>          kfree(kit->css_it);
> >>   }
> >>
> >> +struct bpf_iter_process_kern {
> >> +       struct task_struct *tsk;
> >> +} __attribute__((aligned(8)));
> >> +
> >> +__bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
> >> +{
> >> +       struct bpf_iter_process_kern *kit = (void *)it;
> >> +
> >> +       BUILD_BUG_ON(sizeof(struct bpf_iter_process_kern) != sizeof(struct bpf_iter_process));
> >> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_process_kern) !=
> >> +                                       __alignof__(struct bpf_iter_process));
> >> +
> >> +       rcu_read_lock();
> >> +       kit->tsk = &init_task;
> >> +       return 0;
> >> +}
> >> +
> >> +__bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
> >> +{
> >> +       struct bpf_iter_process_kern *kit = (void *)it;
> >> +
> >> +       kit->tsk = next_task(kit->tsk);
> >> +
> >> +       return kit->tsk == &init_task ? NULL : kit->tsk;
> >> +}
> >> +
> >> +__bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
> >> +{
> >> +       rcu_read_unlock();
> >> +}
> >
> > This iter can be used in all ctx-s which is nice, but let's
> > make the verifier enforce rcu_read_lock/unlock done by bpf prog
> > instead of doing in the ctor/dtor of iter, since
> > in sleepable progs the verifier won't recognize that body is RCU CS.
> > We'd need to teach the verifier to allow bpf_iter_process_new()
> > inside in_rcu_cs() and make sure there is no rcu_read_unlock
> > while BPF_ITER_STATE_ACTIVE.
> > bpf_iter_process_destroy() would become a nop.
>
> Thanks for your review!
>
> I think bpf_iter_process_{new, next, destroy} should be protected by
> bpf_rcu_read_lock/unlock explicitly whether the prog is sleepable or
> not, right?

Correct. By explicit bpf_rcu_read_lock() in case of sleepable progs
or just by using them in normal bpf progs that have implicit rcu_read_lock()
done before calling into them.

> I'm not very familiar with the BPF verifier, but I believe
> there is still a risk in directly calling these kfuns even if
> in_rcu_cs() is true.
>
> Maby what we actually need here is to enforce BPF verifier to check
> env->cur_state->active_rcu_lock is true when we want to call these kfuncs.

active_rcu_lock means explicit bpf_rcu_read_lock.
Currently we do allow bpf_rcu_read_lock in non-sleepable, but it's pointless.

Technically we can extend the check:
                if (in_rbtree_lock_required_cb(env) && (rcu_lock ||
rcu_unlock)) {
                        verbose(env, "Calling
bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
                        return -EACCES;
                }
to discourage their use in all non-sleepable, but it will break some progs.

I think it's ok to check in_rcu_cs() to allow bpf_iter_process_*().
If bpf prog adds explicit and unnecessary bpf_rcu_read_lock() around
the iter ops it won't do any harm.
Just need to make sure that rcu unlock logic:
                } else if (rcu_unlock) {
                        bpf_for_each_reg_in_vstate(env->cur_state,
state, reg, ({
                                if (reg->type & MEM_RCU) {
                                        reg->type &= ~(MEM_RCU |
PTR_MAYBE_NULL);
                                        reg->type |= PTR_UNTRUSTED;
                                }
                        }));
clears iter state that depends on rcu.

I thought about changing mark_stack_slots_iter() to do
st->type = PTR_TO_STACK | MEM_RCU;
so that the above clearing logic kicks in,
but it might be better to have something iter specific.
is_iter_reg_valid_init() should probably be changed to
make sure reg->type is not UNTRUSTED.

Andrii,
do you have better suggestions?