lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP01T77cWxWNwq5HLr+Woiu7k4-P3QQfJWX1OeQJUkxW3=P4bA@mail.gmail.com>
Date:   Wed, 13 Sep 2023 00:20:38 +0200
From:   Kumar Kartikeya Dwivedi <memxor@...il.com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Chuyi Zhou <zhouchuyi@...edance.com>,
        bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 2/4] bpf: Introduce process open coded
 iterator kfuncs

On Wed, 13 Sept 2023 at 00:12, Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> On Wed, Sep 6, 2023 at 10:18 AM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Wed, Sep 6, 2023 at 5:38 AM Chuyi Zhou <zhouchuyi@...edance.com> wrote:
> > >
> > > Hello, Alexei.
> > >
> > > 在 2023/9/6 04:09, Alexei Starovoitov 写道:
> > > > On Sun, Aug 27, 2023 at 12:21 AM Chuyi Zhou <zhouchuyi@...edance.com> wrote:
> > > >>
> > > >> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
> > > >> creation and manipulation of struct bpf_iter_process in open-coded iterator
> > > >> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> > > >> iterate all processes in the system.
> > > >>
> > > >> Signed-off-by: Chuyi Zhou <zhouchuyi@...edance.com>
> > > >> ---
> > > >>   include/uapi/linux/bpf.h       |  4 ++++
> > > >>   kernel/bpf/helpers.c           |  3 +++
> > > >>   kernel/bpf/task_iter.c         | 31 +++++++++++++++++++++++++++++++
> > > >>   tools/include/uapi/linux/bpf.h |  4 ++++
> > > >>   tools/lib/bpf/bpf_helpers.h    |  5 +++++
> > > >>   5 files changed, 47 insertions(+)
> > > >>
> > > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > >> index 2a6e9b99564b..cfbd527e3733 100644
> > > >> --- a/include/uapi/linux/bpf.h
> > > >> +++ b/include/uapi/linux/bpf.h
> > > >> @@ -7199,4 +7199,8 @@ struct bpf_iter_css_task {
> > > >>          __u64 __opaque[1];
> > > >>   } __attribute__((aligned(8)));
> > > >>
> > > >> +struct bpf_iter_process {
> > > >> +       __u64 __opaque[1];
> > > >> +} __attribute__((aligned(8)));
> > > >> +
> > > >>   #endif /* _UAPI__LINUX_BPF_H__ */
> > > >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > >> index cf113ad24837..81a2005edc26 100644
> > > >> --- a/kernel/bpf/helpers.c
> > > >> +++ b/kernel/bpf/helpers.c
> > > >> @@ -2458,6 +2458,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> > > >>   BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
> > > >>   BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> > > >>   BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> > > >> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> > > >> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> > > >> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> > > >>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> > > >>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> > > >>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> > > >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> > > >> index b1bdba40b684..a6717a76c1e0 100644
> > > >> --- a/kernel/bpf/task_iter.c
> > > >> +++ b/kernel/bpf/task_iter.c
> > > >> @@ -862,6 +862,37 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> > > >>          kfree(kit->css_it);
> > > >>   }
> > > >>
> > > >> +struct bpf_iter_process_kern {
> > > >> +       struct task_struct *tsk;
> > > >> +} __attribute__((aligned(8)));
> > > >> +
> > > >> +__bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
> > > >> +{
> > > >> +       struct bpf_iter_process_kern *kit = (void *)it;
> > > >> +
> > > >> +       BUILD_BUG_ON(sizeof(struct bpf_iter_process_kern) != sizeof(struct bpf_iter_process));
> > > >> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_process_kern) !=
> > > >> +                                       __alignof__(struct bpf_iter_process));
> > > >> +
> > > >> +       rcu_read_lock();
> > > >> +       kit->tsk = &init_task;
> > > >> +       return 0;
> > > >> +}
> > > >> +
> > > >> +__bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
> > > >> +{
> > > >> +       struct bpf_iter_process_kern *kit = (void *)it;
> > > >> +
> > > >> +       kit->tsk = next_task(kit->tsk);
> > > >> +
> > > >> +       return kit->tsk == &init_task ? NULL : kit->tsk;
> > > >> +}
> > > >> +
> > > >> +__bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
> > > >> +{
> > > >> +       rcu_read_unlock();
> > > >> +}
> > > >
> > > > This iter can be used in all ctx-s which is nice, but let's
> > > > make the verifier enforce rcu_read_lock/unlock done by bpf prog
> > > > instead of doing in the ctor/dtor of iter, since
> > > > in sleepable progs the verifier won't recognize that body is RCU CS.
> > > > We'd need to teach the verifier to allow bpf_iter_process_new()
> > > > inside in_rcu_cs() and make sure there is no rcu_read_unlock
> > > > while BPF_ITER_STATE_ACTIVE.
> > > > bpf_iter_process_destroy() would become a nop.
> > >
> > > Thanks for your review!
> > >
> > > I think bpf_iter_process_{new, next, destroy} should be protected by
> > > bpf_rcu_read_lock/unlock explicitly whether the prog is sleepable or
> > > not, right?
> >
> > Correct. By explicit bpf_rcu_read_lock() in case of sleepable progs
> > or just by using them in normal bpf progs that have implicit rcu_read_lock()
> > done before calling into them.
> >
> > > I'm not very familiar with the BPF verifier, but I believe
> > > there is still a risk in directly calling these kfuns even if
> > > in_rcu_cs() is true.
> > >
> > > Maby what we actually need here is to enforce BPF verifier to check
> > > env->cur_state->active_rcu_lock is true when we want to call these kfuncs.
> >
> > active_rcu_lock means explicit bpf_rcu_read_lock.
> > Currently we do allow bpf_rcu_read_lock in non-sleepable, but it's pointless.
> >
> > Technically we can extend the check:
> >                 if (in_rbtree_lock_required_cb(env) && (rcu_lock ||
> > rcu_unlock)) {
> >                         verbose(env, "Calling
> > bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
> >                         return -EACCES;
> >                 }
> > to discourage their use in all non-sleepable, but it will break some progs.
> >
> > I think it's ok to check in_rcu_cs() to allow bpf_iter_process_*().
> > If bpf prog adds explicit and unnecessary bpf_rcu_read_lock() around
> > the iter ops it won't do any harm.
> > Just need to make sure that rcu unlock logic:
> >                 } else if (rcu_unlock) {
> >                         bpf_for_each_reg_in_vstate(env->cur_state,
> > state, reg, ({
> >                                 if (reg->type & MEM_RCU) {
> >                                         reg->type &= ~(MEM_RCU |
> > PTR_MAYBE_NULL);
> >                                         reg->type |= PTR_UNTRUSTED;
> >                                 }
> >                         }));
> > clears iter state that depends on rcu.
> >
> > I thought about changing mark_stack_slots_iter() to do
> > st->type = PTR_TO_STACK | MEM_RCU;
> > so that the above clearing logic kicks in,
> > but it might be better to have something iter specific.
> > is_iter_reg_valid_init() should probably be changed to
> > make sure reg->type is not UNTRUSTED.
> >
> > Andrii,
> > do you have better suggestions?
>
> What if we just remember inside bpf_reg_state.iter state whether
> iterator needs to be RCU protected (it's just one bit if we don't
> allow nesting rcu_read_lock()/rcu_read_unlock(), or we'd need to
> remember RCU nestedness level), and then when validating iter_next and
> iter_destroy() kfuncs, check that we are still in RCU-protected region
> (if we have nestedness, then iter->rcu_nest_level <=
> cur_rcu_nest_level, if I understand correctly). And if not, provide a
> clear and nice message.
>
> That seems straightforward enough, but am I missing anything subtle?
>

We also need to ensure one does not do a bpf_rcu_read_unlock and
bpf_rcu_read_lock again between the iter_new and
iter_next/iter_destroy calls. Simply checking we are in an RCU
protected region will pass the verifier in such a case.

A simple solution might be associating an ID with the RCU CS, so make
active_rcu_lock a 32-bit ID which is monotonically increasing for each
new RCU region. Ofcourse, all of this only matters for sleepable
programs. Then check if id recorded in iter state is same on next and
destroy.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ