[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<AM6PR03MB5080C17BB893CE931E40DA28993D2@AM6PR03MB5080.eurprd03.prod.outlook.com>
Date: Tue, 10 Dec 2024 16:25:51 +0000
From: Juntong Deng <juntong.deng@...look.com>
To: Christian Brauner <brauner@...nel.org>
Cc: ast@...nel.org, daniel@...earbox.net, john.fastabend@...il.com,
andrii@...nel.org, martin.lau@...ux.dev, eddyz87@...il.com, song@...nel.org,
yonghong.song@...ux.dev, kpsingh@...nel.org, sdf@...ichev.me,
haoluo@...gle.com, jolsa@...nel.org, memxor@...il.com, snorcht@...il.com,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH bpf-next v5 0/5] bpf: Add open-coded style process file
iterator and bpf_fget_task() kfunc
On 2024/12/10 14:44, Christian Brauner wrote:
> On Tue, Dec 10, 2024 at 02:01:53PM +0000, Juntong Deng wrote:
>> This patch series adds open-coded style process file iterator
>> bpf_iter_task_file and bpf_fget_task() kfunc, and corresponding
>> selftests test cases.
>>
>> In addition, since fs kfuncs is generic and useful for scenarios
>> other than LSM, this patch makes fs kfuncs available for SYSCALL
>> and TRACING program types [0].
>>
>> [0]: https://lore.kernel.org/bpf/CAPhsuW6ud21v2xz8iSXf=CiDL+R_zpQ+p8isSTMTw=EiJQtRSw@mail.gmail.com/
>>
>> Although iter/task_file already exists, for CRIB we still need the
>
> What is CRIB?
>
CRIB is Checkpoint/Restore In eBPF.
Introductions to CRIB can be found at the following links:
https://lpc.events/event/18/contributions/1812/
https://lore.kernel.org/bpf/AM6PR03MB58488045E4D0FA6AEDC8BDE099A52@AM6PR03MB5848.eurprd03.prod.outlook.com/T/#u
https://lwn.net/Articles/984313/
>> open-coded iterator style process file iterator, and the same is true
>> for other bpf iterators such as iter/tcp, iter/udp, etc.
>>
>> The traditional bpf iterator is more like a bpf version of procfs, but
>> similar to procfs, it is not suitable for CRIB scenarios that need to
>> obtain large amounts of complex, multi-level in-kernel information.
>>
>> The following is from previous discussions [2].
>>
>> [2]: https://lore.kernel.org/bpf/AM6PR03MB5848CA34B5B68C90F210285E99B12@AM6PR03MB5848.eurprd03.prod.outlook.com/
>>
>> This is because the context of bpf iterators is fixed and bpf iterators
>> cannot be nested. This means that a bpf iterator program can only
>> complete a specific small iterative dump task, and cannot dump
>> multi-level data.
>>
>> An example, when we need to dump all the sockets of a process, we need
>> to iterate over all the files (sockets) of the process, and iterate over
>> the all packets in the queue of each socket, and iterate over all data
>> in each packet.
>>
>> If we use bpf iterator, since the iterator can not be nested, we need to
>> use socket iterator program to get all the basic information of all
>> sockets (pass pid as filter), and then use packet iterator program to
>> get the basic information of all packets of a specific socket (pass pid,
>> fd as filter), and then use packet data iterator program to get all the
>> data of a specific packet (pass pid, fd, packet index as filter).
>>
>> This would be complicated and require a lot of (each iteration)
>> bpf program startup and exit (leading to poor performance).
>>
>> By comparison, open coded iterator is much more flexible, we can iterate
>> in any context, at any time, and iteration can be nested, so we can
>> achieve more flexible and more elegant dumping through open coded
>> iterators.
>>
>> With open coded iterators, all of the above can be done in a single
>> bpf program, and with nested iterators, everything becomes compact
>> and simple.
>>
>> Also, bpf iterators transmit data to user space through seq_file,
>> which involves a lot of open (bpf_iter_create), read, close syscalls,
>> context switching, memory copying, and cannot achieve the performance
>> of using ringbuf.
>>
>> Signed-off-by: Juntong Deng <juntong.deng@...look.com>
>> ---
>> v4 -> v5:
>> * Add file type checks in test cases for process file iterator
>> and bpf_fget_task().
>>
>> * Use fentry to synchronize tests instead of waiting in a loop.
>>
>> * Remove path_d_path_kfunc_non_lsm test case.
>>
>> * Replace task_lookup_next_fdget_rcu() with fget_task_next().
>>
>> * Remove future merge conflict section in cover letter (resolved).
>>
>> v3 -> v4:
>> * Make all kfuncs generic, not CRIB specific.
>>
>> * Move bpf_fget_task to fs/bpf_fs_kfuncs.c.
>>
>> * Remove bpf_iter_task_file_get_fd and bpf_get_file_ops_type.
>>
>> * Use struct bpf_iter_task_file_item * as the return value of
>> bpf_iter_task_file_next.
>>
>> * Change fd to unsigned int type and add next_fd.
>>
>> * Add KF_RCU_PROTECTED to bpf_iter_task_file_new.
>>
>> * Make fs kfuncs available to SYSCALL and TRACING program types.
>>
>> * Update all relevant test cases.
>>
>> * Remove the discussion section from cover letter.
>>
>> v2 -> v3:
>> * Move task_file open-coded iterator to kernel/bpf/helpers.c.
>>
>> * Fix duplicate error code 7 in test_bpf_iter_task_file().
>>
>> * Add comment for case when bpf_iter_task_file_get_fd() returns -1.
>>
>> * Add future plans in commit message of "Add struct file related
>> CRIB kfuncs".
>>
>> * Add Discussion section to cover letter.
>>
>> v1 -> v2:
>> * Fix a type definition error in the fd parameter of
>> bpf_fget_task() at crib_common.h.
>>
>> Juntong Deng (5):
>> bpf: Introduce task_file open-coded iterator kfuncs
>> selftests/bpf: Add tests for open-coded style process file iterator
>> bpf: Add bpf_fget_task() kfunc
>> bpf: Make fs kfuncs available for SYSCALL and TRACING program types
>> selftests/bpf: Add tests for bpf_fget_task() kfunc
>>
>> fs/bpf_fs_kfuncs.c | 42 ++++---
>> kernel/bpf/helpers.c | 3 +
>> kernel/bpf/task_iter.c | 92 ++++++++++++++
>> .../testing/selftests/bpf/bpf_experimental.h | 15 +++
>> .../selftests/bpf/prog_tests/fs_kfuncs.c | 46 +++++++
>> .../testing/selftests/bpf/prog_tests/iters.c | 79 ++++++++++++
>> .../selftests/bpf/progs/fs_kfuncs_failure.c | 33 +++++
>> .../selftests/bpf/progs/iters_task_file.c | 88 ++++++++++++++
>> .../bpf/progs/iters_task_file_failure.c | 114 ++++++++++++++++++
>> .../selftests/bpf/progs/test_fget_task.c | 63 ++++++++++
>> .../selftests/bpf/progs/verifier_vfs_reject.c | 10 --
>> 11 files changed, 559 insertions(+), 26 deletions(-)
>> create mode 100644 tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c
>> create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c
>> create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c
>> create mode 100644 tools/testing/selftests/bpf/progs/test_fget_task.c
>>
>> --
>> 2.39.5
>>
Powered by blists - more mailing lists