[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzZJuWcCLeUdmzhRVe9nyi9jAN8y=u2nK=mqzxXG6DTkDw@mail.gmail.com>
Date: Wed, 6 Nov 2024 14:15:27 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Juntong Deng <juntong.deng@...look.com>
Cc: ast@...nel.org, daniel@...earbox.net, john.fastabend@...il.com,
andrii@...nel.org, martin.lau@...ux.dev, eddyz87@...il.com, song@...nel.org,
yonghong.song@...ux.dev, kpsingh@...nel.org, sdf@...ichev.me,
haoluo@...gle.com, jolsa@...nel.org, memxor@...il.com, snorcht@...il.com,
brauner@...nel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH bpf-next v3 0/4] bpf/crib: Add open-coded style process
file iterator and file related CRIB kfuncs
On Wed, Nov 6, 2024 at 11:35 AM Juntong Deng <juntong.deng@...look.com> wrote:
>
> This patch series adds open-coded style process file iterator
> bpf_iter_task_file and file related kfuncs bpf_fget_task(),
> bpf_get_file_ops_type(), and corresponding selftests test cases.
>
> Known future merge conflict: In linux-next task_lookup_next_fdget_rcu()
> has been removed and replaced with fget_task_next() [0], but that has
> not happened yet in bpf-next, so I still
> use task_lookup_next_fdget_rcu() in bpf_iter_task_file_next().
>
> [0]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8fd3395ec9051a52828fcca2328cb50a69dea8ef
>
> Although iter/task_file already exists, for CRIB we still need the
> open-coded iterator style process file iterator, and the same is true
> for other bpf iterators such as iter/tcp, iter/udp, etc.
>
> The traditional bpf iterator is more like a bpf version of procfs, but
> similar to procfs, it is not suitable for CRIB scenarios that need to
> obtain large amounts of complex, multi-level in-kernel information.
>
> The following is from previous discussions [1].
>
> [1]: https://lore.kernel.org/bpf/AM6PR03MB5848CA34B5B68C90F210285E99B12@AM6PR03MB5848.eurprd03.prod.outlook.com/
>
> This is because the context of bpf iterators is fixed and bpf iterators
> cannot be nested. This means that a bpf iterator program can only
> complete a specific small iterative dump task, and cannot dump
> multi-level data.
>
> An example, when we need to dump all the sockets of a process, we need
> to iterate over all the files (sockets) of the process, and iterate over
> the all packets in the queue of each socket, and iterate over all data
> in each packet.
>
> If we use bpf iterator, since the iterator can not be nested, we need to
> use socket iterator program to get all the basic information of all
> sockets (pass pid as filter), and then use packet iterator program to
> get the basic information of all packets of a specific socket (pass pid,
> fd as filter), and then use packet data iterator program to get all the
> data of a specific packet (pass pid, fd, packet index as filter).
>
> This would be complicated and require a lot of (each iteration)
> bpf program startup and exit (leading to poor performance).
>
> By comparison, open coded iterator is much more flexible, we can iterate
> in any context, at any time, and iteration can be nested, so we can
> achieve more flexible and more elegant dumping through open coded
> iterators.
>
> With open coded iterators, all of the above can be done in a single
> bpf program, and with nested iterators, everything becomes compact
> and simple.
>
> Also, bpf iterators transmit data to user space through seq_file,
> which involves a lot of open (bpf_iter_create), read, close syscalls,
> context switching, memory copying, and cannot achieve the performance
> of using ringbuf.
>
> Discussion
> ----------
>
> 1. Do we need bpf_iter_task_file_get_fd()?
>
> Andrii suggested that next() should return a pointer to
> a bpf_iter_task_file_item, which contains *file and fd.
>
> This is feasible, but it might compromise iterator encapsulation?
I don't think so, replied on v2 ([0]). I know you saw that, I'm just
linking it for others.
[0] https://lore.kernel.org/bpf/CAEf4Bzba2N7pxPQh8_BDrVgupZdeow_3S7xSjDmsdhL19eXb3A@mail.gmail.com/
>
> More detailed discussion can be found at [3] [4]
>
> [3]: https://lore.kernel.org/bpf/CAEf4Bzbt0kh53xYZL57Nc9AWcYUKga_NQ6uUrTeU4bj8qyTLng@mail.gmail.com/
> [4]: https://lore.kernel.org/bpf/AM6PR03MB584814D93FE3680635DE61A199562@AM6PR03MB5848.eurprd03.prod.outlook.com/
>
> What should we do? Maybe more discussion is needed?
>
> 2. Where should we put CRIB related kfuncs?
>
> I totally agree that most of the CRIB related kfuncs are not
> CRIB specific.
>
> The goal of CRIB is to collect all relevant information about a process,
> which means we need to add kfuncs involving several different kernel
> subsystems (though these kfuncs are not complex and many just help the
> bpf program reach a certain data structure).
>
> But here is a question, where should these CRIB kfuncs be placed?
> There doesn't seem to be a suitable file to put them in.
>
> My current idea is to create a crib folder and then create new files for
> the relevant subsystems, e.g. crib/files.c, crib/socket.c, crib/mount.c
> etc. Putting them in the same folder makes it easier to maintain
> them centrally.
>
> If anyone else wants to use CRIB kfuncs, welcome to use them.
>
CRIB is just one of possible applications of such kfuncs, so I'd steer
away from over-specifying it as CRIB.
task_file open-coded iterator is generic, and should stay close to
other task iterator code, as you do in this revision.
bpf_get_file_ops_type() is unnecessary, as we already discussed on v2,
__ksym and comparison is the way to go here.
bpf_fget_task(), if VFS folks agree to add it, probably will have to
stay close to other similar VFS helpers.
> Signed-off-by: Juntong Deng <juntong.deng@...look.com>
> ---
> v2 -> v3:
> 1. Move task_file open-coded iterator to kernel/bpf/helpers.c.
>
> 2. Fix duplicate error code 7 in test_bpf_iter_task_file().
>
> 3. Add comment for case when bpf_iter_task_file_get_fd() returns -1.
>
> 4. Add future plans in commit message of "Add struct file related
> CRIB kfuncs".
>
> 5. Add Discussion section to cover letter.
>
> v1 -> v2:
> Fix a type definition error in the fd parameter of
> bpf_fget_task() at crib_common.h.
>
> Juntong Deng (4):
> bpf/crib: Introduce task_file open-coded iterator kfuncs
> selftests/bpf: Add tests for open-coded style process file iterator
> bpf/crib: Add struct file related CRIB kfuncs
> selftests/bpf: Add tests for struct file related CRIB kfuncs
>
> kernel/bpf/Makefile | 1 +
> kernel/bpf/crib/Makefile | 3 +
> kernel/bpf/crib/crib.c | 28 ++++
> kernel/bpf/crib/files.c | 54 ++++++++
> kernel/bpf/helpers.c | 4 +
> kernel/bpf/task_iter.c | 96 +++++++++++++
> tools/testing/selftests/bpf/prog_tests/crib.c | 126 ++++++++++++++++++
> .../testing/selftests/bpf/progs/crib_common.h | 25 ++++
> .../selftests/bpf/progs/crib_files_failure.c | 108 +++++++++++++++
> .../selftests/bpf/progs/crib_files_success.c | 119 +++++++++++++++++
> 10 files changed, 564 insertions(+)
> create mode 100644 kernel/bpf/crib/Makefile
> create mode 100644 kernel/bpf/crib/crib.c
> create mode 100644 kernel/bpf/crib/files.c
> create mode 100644 tools/testing/selftests/bpf/prog_tests/crib.c
> create mode 100644 tools/testing/selftests/bpf/progs/crib_common.h
> create mode 100644 tools/testing/selftests/bpf/progs/crib_files_failure.c
> create mode 100644 tools/testing/selftests/bpf/progs/crib_files_success.c
>
> --
> 2.39.5
>
Powered by blists - more mailing lists