linux-kernel - Re: [PATCH bpf-next v3 3/7] bpf: Introduce task open coded iterator kfuncs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzaAtybx=Cbb6zD1otgQ-Jm+Xta0_8rwmL_ZYb3GzjSwWg@mail.gmail.com>
Date:   Fri, 29 Sep 2023 14:27:51 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Chuyi Zhou <zhouchuyi@...edance.com>
Cc:     bpf@...r.kernel.org, ast@...nel.org, daniel@...earbox.net,
        andrii@...nel.org, martin.lau@...nel.org, tj@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v3 3/7] bpf: Introduce task open coded iterator kfuncs

On Wed, Sep 27, 2023 at 8:29 PM Chuyi Zhou <zhouchuyi@...edance.com> wrote:
>
> Hello,
>
> 在 2023/9/28 07:20, Andrii Nakryiko 写道:
> > On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@...edance.com> wrote:
> >>
> >> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
> >> creation and manipulation of struct bpf_iter_task in open-coded iterator
> >> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >> iterate all processes in the system.
> >>
> >> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
> >> accepts a specific task and iterating type which allows:
> >> 1. iterating all process in the system
> >>
> >> 2. iterating all threads in the system
> >>
> >> 3. iterating all threads of a specific task
> >> Here we also resuse enum bpf_iter_task_type and rename BPF_TASK_ITER_TID
> >> to BPF_TASK_ITER_THREAD, rename BPF_TASK_ITER_TGID to BPF_TASK_ITER_PROC.
> >>
> >> The newly-added struct bpf_iter_task has a name collision with a selftest
> >> for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
> >> renamed in order to avoid the collision.
> >>
> >> Signed-off-by: Chuyi Zhou <zhouchuyi@...edance.com>
> >> ---
> >>   include/linux/bpf.h                           |  8 +-
> >>   kernel/bpf/helpers.c                          |  3 +
> >>   kernel/bpf/task_iter.c                        | 96 ++++++++++++++++---
> >>   .../testing/selftests/bpf/bpf_experimental.h  |  5 +
> >>   .../selftests/bpf/prog_tests/bpf_iter.c       | 18 ++--
> >>   .../{bpf_iter_task.c => bpf_iter_tasks.c}     |  0
> >>   6 files changed, 106 insertions(+), 24 deletions(-)
> >>   rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
> >>
> >
> > [...]
> >
> >> @@ -692,9 +692,9 @@ static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct b
> >>   static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struct seq_file *seq)
> >>   {
> >>          seq_printf(seq, "task_type:\t%s\n", iter_task_type_names[aux->task.type]);
> >> -       if (aux->task.type == BPF_TASK_ITER_TID)
> >> +       if (aux->task.type == BPF_TASK_ITER_THREAD)
> >>                  seq_printf(seq, "tid:\t%u\n", aux->task.pid);
> >> -       else if (aux->task.type == BPF_TASK_ITER_TGID)
> >> +       else if (aux->task.type == BPF_TASK_ITER_PROC)
> >>                  seq_printf(seq, "pid:\t%u\n", aux->task.pid);
> >>   }
> >>
> >> @@ -856,6 +856,80 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> >>          bpf_mem_free(&bpf_global_ma, kit->css_it);
> >>   }
> >>
> >> +struct bpf_iter_task {
> >> +       __u64 __opaque[2];
> >> +       __u32 __opaque_int[1];
> >
> > this should be __u64 __opaque[3], because struct takes full 24 bytes
> >
> >> +} __attribute__((aligned(8)));
> >> +
> >> +struct bpf_iter_task_kern {
> >> +       struct task_struct *task;
> >> +       struct task_struct *pos;
> >> +       unsigned int type;
> >> +} __attribute__((aligned(8)));
> >> +
> >> +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type)
> >
> > nit: type -> flags, so we can add a bit more stuff, if necessary
> >
> >> +{
> >> +       struct bpf_iter_task_kern *kit = (void *)it;
> >
> > empty line after variable declarations
> >
> >> +       BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) != sizeof(struct bpf_iter_task));
> >> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
> >> +                                       __alignof__(struct bpf_iter_task));
> >
> > and I'd add empty line here to keep BUILD_BUG_ON block separate
> >
> >> +       kit->task = kit->pos = NULL;
> >> +       switch (type) {
> >> +       case BPF_TASK_ITER_ALL:
> >> +       case BPF_TASK_ITER_PROC:
> >> +       case BPF_TASK_ITER_THREAD:
> >> +               break;
> >> +       default:
> >> +               return -EINVAL;
> >> +       }
> >> +
> >> +       if (type == BPF_TASK_ITER_THREAD)
> >> +               kit->task = task;
> >> +       else
> >> +               kit->task = &init_task;
> >> +       kit->pos = kit->task;
> >> +       kit->type = type;
> >> +       return 0;
> >> +}
> >> +
> >> +__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it)
> >> +{
> >> +       struct bpf_iter_task_kern *kit = (void *)it;
> >> +       struct task_struct *pos;
> >> +       unsigned int type;
> >> +
> >> +       type = kit->type;
> >> +       pos = kit->pos;
> >> +
> >> +       if (!pos)
> >> +               goto out;
> >> +
> >> +       if (type == BPF_TASK_ITER_PROC)
> >> +               goto get_next_task;
> >> +
> >> +       kit->pos = next_thread(kit->pos);
> >> +       if (kit->pos == kit->task) {
> >> +               if (type == BPF_TASK_ITER_THREAD) {
> >> +                       kit->pos = NULL;
> >> +                       goto out;
> >> +               }
> >> +       } else
> >> +               goto out;
> >> +
> >> +get_next_task:
> >> +       kit->pos = next_task(kit->pos);
> >> +       kit->task = kit->pos;
> >> +       if (kit->pos == &init_task)
> >> +               kit->pos = NULL;
> >
> > I can't say I completely follow the logic (e.g., for
> > BPF_TASK_ITER_PROC, why do we do next_task() on first next() call)?
> > Can you elabore the expected behavior for various combinations of
> > types and starting task argument?
> >
>
> Thanks for the review.
>
> The expected behavior of current implementation is:
>
> BPF_TASK_ITER_PROC:
>
> init_task->first_process->second_process->...->last_process->init_task
>
> We would exit before visiting init_task again.

ah, ok, so in this case it's more like BPF_TASK_ITER_ALL_PROCS, i.e.,
we iterate all processes in the system. Input `task` that we provide
is ignored/meaningless, right? Maybe we should express it as
ALL_PROCS?

>
> BPF_TASK_ITER_THREAD:
>
> group_task->first_thread->second_thread->...->last_thread->group_task
>
> We would exit before visiting group_task again.
>

And this one is iterating threads of a process specified by given
`task`, right?   This is where my confusion comes from. ITER_PROC and
ITER_THREAD, by their name, seems to be very similar, but in reality
ITER_PROC is more like ITER_ALL (except process vs thread iteration),
while ITER_THREAD is parameterized by input `task`.

I'm not sure what's the least confusing way to name and organize
everything, but I think it's quite confusing right now, unfortunately.
I wonder if you or someone else have a better suggestion on making
this more straightforward?


> BPF_TASK_ITER_ALL:
>
> init_task -> first_process -> second_process -> ...
>                  |                    |
>                 -> first_thread..    |
>                                      -> first_thread
>

Ok, and this one is "all threads in the system". So
BPF_TASK_ITER_ALL_THREADS would describe it more precisely, right?

> Actually, every next() call, we would return the "pos" which was
> prepared by previous next() call, and use next_task()/next_thread() to
> update kit->pos. Once we meet the exit condition (next_task() return
> init_task or next_thread() return group_task), we would update kit->pos
> to NULL. In this way, when next() is called again, we will terminate the
> iteration.
>
> Here "kit->pos = NULL;" means we would return the last valid "pos" and
> will return NULL in next call to exit from the iteration.
>
> Am I miss something important?

No, it's my bad. I did check, but somehow concluded that you are
returning kit->pos, but you are returning locally cached previous
value of kit->pos. All good here, I think.

>
> Thanks.
>
>
>