lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzYBQVX-YQyZiJe+xrMUmk_k+mU=Q-RNULeS4pt-YyzQUA@mail.gmail.com>
Date:   Wed, 26 Aug 2020 22:07:56 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Yonghong Song <yhs@...com>
Cc:     bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next 2/5] bpf: add main_thread_only customization for
 task/task_file iterators

On Wed, Aug 26, 2020 at 5:07 PM Yonghong Song <yhs@...com> wrote:
>
> Currently, task and task_file by default iterates through
> all tasks. For task_file, by default, all files from all tasks
> will be traversed.
>
> But for a user process, the file_table is shared by all threads
> of that process. So traversing the main thread per process should
> be enough to traverse all files and this can save a lot of cpu
> time if some process has large number of threads and each thread
> has lots of open files.
>
> This patch implemented a customization for task/task_file iterator,
> permitting to traverse only the kernel task where its pid equal
> to tgid in the kernel. This includes some kernel threads, and
> main threads of user processes. This will solve the above potential
> performance issue for task_file. This customization may be useful
> for task iterator too if only traversing main threads is enough.
>
> Signed-off-by: Yonghong Song <yhs@...com>
> ---
>  include/linux/bpf.h            |  3 ++-
>  include/uapi/linux/bpf.h       |  5 ++++
>  kernel/bpf/task_iter.c         | 46 +++++++++++++++++++++++-----------
>  tools/include/uapi/linux/bpf.h |  5 ++++
>  4 files changed, 43 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a6131d95e31e..058eb9b0ba78 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1220,7 +1220,8 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
>         int __init bpf_iter_ ## target(args) { return 0; }
>
>  struct bpf_iter_aux_info {
> -       struct bpf_map *map;
> +       struct bpf_map *map;    /* for iterator traversing map elements */
> +       bool main_thread_only;  /* for task/task_file iterator */

As a user of task_file iterator I'd hate to make this decision,
honestly, especially if I can't prove that all processes share the
same file table (I think clone() syscall allows to do weird
combinations like that, right?). It does make sense for a task/
iterator, though, if I need to iterate a user-space process (group of
tasks). So can we do this:

1a. Either add a new bpf_iter type process/ (or in kernel lingo
task_group/) to iterate only main threads (group_leader in kernel
lingo);
1b. Or make this main_thread_only an option for only a task/ iterator
(and maybe call it "group_leader_only" or something to reflect kernel
terminology?)

2. For task/file iterator, still iterate all tasks, but if the task's
files struct is the same as group_leader's files struct, then go to
the next one. This should eliminate tons of duplication of iterating
the same files over and over. It would still iterate a bunch of tasks,
but compared to the number of files that's generally a much smaller
number, so should still give sizable savings. I don't think we need an
extra option for this, tbh, this behavior was my intuitive
expectation, so I don't think you'll be breaking any sane user of this
iterator.

Disclaimer: I haven't got a chance to look through kernel code much,
so I'm sorry if what I'm proposing is something that is impossible to
implement or extremely hard/unreliable. But thought I'll put this idea
out there before we decide on this.

WDYT?

[...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ