netdev - Re: [PATCH bpf-next v1 11/19] bpf: add task and task/file targets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dc46e006-468d-22d6-91bc-2c8e75590205@fb.com>
Date:   Fri, 1 May 2020 10:23:17 -0700
From:   Yonghong Song <yhs@...com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
CC:     Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Martin KaFai Lau <kafai@...com>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next v1 11/19] bpf: add task and task/file targets



On 4/29/20 7:08 PM, Andrii Nakryiko wrote:
> On Mon, Apr 27, 2020 at 1:17 PM Yonghong Song <yhs@...com> wrote:
>>
>> Only the tasks belonging to "current" pid namespace
>> are enumerated.
>>
>> For task/file target, the bpf program will have access to
>>    struct task_struct *task
>>    u32 fd
>>    struct file *file
>> where fd/file is an open file for the task.
>>
>> Signed-off-by: Yonghong Song <yhs@...com>
>> ---
>>   kernel/bpf/Makefile    |   2 +-
>>   kernel/bpf/task_iter.c | 319 +++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 320 insertions(+), 1 deletion(-)
>>   create mode 100644 kernel/bpf/task_iter.c
>>
> 
> [...]
> 
>> +static void *task_seq_start(struct seq_file *seq, loff_t *pos)
>> +{
>> +       struct bpf_iter_seq_task_info *info = seq->private;
>> +       struct task_struct *task;
>> +       u32 id = info->id;
>> +
>> +       if (*pos == 0)
>> +               info->ns = task_active_pid_ns(current);
> 
> I wonder why pid namespace is set in start() callback each time, while
> net_ns was set once when seq_file is created. I think it should be
> consistent, no? Either pid_ns is another feature and is set
> consistently just once using the context of the process that creates
> seq_file, or net_ns could be set using the same method without
> bpf_iter infra knowing about this feature? Or there are some
> non-obvious aspects which make pid_ns easier to work with?
> 
> Either way, process read()'ing seq_file might be different than
> process open()'ing seq_file, so they might have different namespaces.
> We need to decide explicitly which context should be used and do it
> consistently.

Good point. for networking case, the `net` namespace is locked
at seq_file open stage and later on it is used for seq_read().

I think I should do the same thing, locking down pid namespace
at open.

> 
>> +
>> +       task = task_seq_get_next(info->ns, &id);
>> +       if (!task)
>> +               return NULL;
>> +
>> +       ++*pos;
>> +       info->task = task;
>> +       info->id = id;
>> +
>> +       return task;
>> +}
>> +
>> +static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>> +{
>> +       struct bpf_iter_seq_task_info *info = seq->private;
>> +       struct task_struct *task;
>> +
>> +       ++*pos;
>> +       ++info->id;
> 
> this would make iterator skip pid 0? Is that by design?

The start will try to find pid 0. That means start will never
return SEQ_START_TOKEN since the bpf program won't be called any way.

> 
>> +       task = task_seq_get_next(info->ns, &info->id);
>> +       if (!task)
>> +               return NULL;
>> +
>> +       put_task_struct(info->task);
> 
> on very first iteration info->task might be NULL, right?

Even the first iteration info->task is not NULL. The start()
will forcefully try to find the first real task from idr number 0.

> 
>> +       info->task = task;
>> +       return task;
>> +}
>> +
>> +struct bpf_iter__task {
>> +       __bpf_md_ptr(struct bpf_iter_meta *, meta);
>> +       __bpf_md_ptr(struct task_struct *, task);
>> +};
>> +
>> +int __init __bpf_iter__task(struct bpf_iter_meta *meta, struct task_struct *task)
>> +{
>> +       return 0;
>> +}
>> +
>> +static int task_seq_show(struct seq_file *seq, void *v)
>> +{
>> +       struct bpf_iter_meta meta;
>> +       struct bpf_iter__task ctx;
>> +       struct bpf_prog *prog;
>> +       int ret = 0;
>> +
>> +       prog = bpf_iter_get_prog(seq, sizeof(struct bpf_iter_seq_task_info),
>> +                                &meta.session_id, &meta.seq_num,
>> +                                v == (void *)0);
>> +       if (prog) {
> 
> can it happen that prog is NULL?

Yes, this function is shared between show() and stop().
The stop() function might be called multiple times since
user can repeatedly try read() although there is nothing
there, in which case, the seq_ops will be just
start() and stop().

> 
> 
>> +               meta.seq = seq;
>> +               ctx.meta = &meta;
>> +               ctx.task = v;
>> +               ret = bpf_iter_run_prog(prog, &ctx);
>> +       }
>> +
>> +       return ret == 0 ? 0 : -EINVAL;
>> +}
>> +
>> +static void task_seq_stop(struct seq_file *seq, void *v)
>> +{
>> +       struct bpf_iter_seq_task_info *info = seq->private;
>> +
>> +       if (!v)
>> +               task_seq_show(seq, v);
> 
> hmm... show() called from stop()? what's the case where this is necessary?

I will refactor it better. This is to invoke bpf program
in stop() with NULL object to signal the end of
iteration.

>> +
>> +       if (info->task) {
>> +               put_task_struct(info->task);
>> +               info->task = NULL;
>> +       }
>> +}
>> +
> 
> [...]
>