[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<AM6PR03MB58481666F9D89607DFDD4C4F99532@AM6PR03MB5848.eurprd03.prod.outlook.com>
Date: Wed, 6 Nov 2024 22:49:26 +0000
From: Juntong Deng <juntong.deng@...look.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: ast@...nel.org, daniel@...earbox.net, john.fastabend@...il.com,
andrii@...nel.org, martin.lau@...ux.dev, eddyz87@...il.com, song@...nel.org,
yonghong.song@...ux.dev, kpsingh@...nel.org, sdf@...ichev.me,
haoluo@...gle.com, jolsa@...nel.org, memxor@...il.com, snorcht@...il.com,
brauner@...nel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH bpf-next v3 0/4] bpf/crib: Add open-coded style process
file iterator and file related CRIB kfuncs
On 2024/11/6 22:15, Andrii Nakryiko wrote:
> On Wed, Nov 6, 2024 at 11:35 AM Juntong Deng <juntong.deng@...look.com> wrote:
>>
>> This patch series adds open-coded style process file iterator
>> bpf_iter_task_file and file related kfuncs bpf_fget_task(),
>> bpf_get_file_ops_type(), and corresponding selftests test cases.
>>
>> Known future merge conflict: In linux-next task_lookup_next_fdget_rcu()
>> has been removed and replaced with fget_task_next() [0], but that has
>> not happened yet in bpf-next, so I still
>> use task_lookup_next_fdget_rcu() in bpf_iter_task_file_next().
>>
>> [0]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8fd3395ec9051a52828fcca2328cb50a69dea8ef
>>
>> Although iter/task_file already exists, for CRIB we still need the
>> open-coded iterator style process file iterator, and the same is true
>> for other bpf iterators such as iter/tcp, iter/udp, etc.
>>
>> The traditional bpf iterator is more like a bpf version of procfs, but
>> similar to procfs, it is not suitable for CRIB scenarios that need to
>> obtain large amounts of complex, multi-level in-kernel information.
>>
>> The following is from previous discussions [1].
>>
>> [1]: https://lore.kernel.org/bpf/AM6PR03MB5848CA34B5B68C90F210285E99B12@AM6PR03MB5848.eurprd03.prod.outlook.com/
>>
>> This is because the context of bpf iterators is fixed and bpf iterators
>> cannot be nested. This means that a bpf iterator program can only
>> complete a specific small iterative dump task, and cannot dump
>> multi-level data.
>>
>> An example, when we need to dump all the sockets of a process, we need
>> to iterate over all the files (sockets) of the process, and iterate over
>> the all packets in the queue of each socket, and iterate over all data
>> in each packet.
>>
>> If we use bpf iterator, since the iterator can not be nested, we need to
>> use socket iterator program to get all the basic information of all
>> sockets (pass pid as filter), and then use packet iterator program to
>> get the basic information of all packets of a specific socket (pass pid,
>> fd as filter), and then use packet data iterator program to get all the
>> data of a specific packet (pass pid, fd, packet index as filter).
>>
>> This would be complicated and require a lot of (each iteration)
>> bpf program startup and exit (leading to poor performance).
>>
>> By comparison, open coded iterator is much more flexible, we can iterate
>> in any context, at any time, and iteration can be nested, so we can
>> achieve more flexible and more elegant dumping through open coded
>> iterators.
>>
>> With open coded iterators, all of the above can be done in a single
>> bpf program, and with nested iterators, everything becomes compact
>> and simple.
>>
>> Also, bpf iterators transmit data to user space through seq_file,
>> which involves a lot of open (bpf_iter_create), read, close syscalls,
>> context switching, memory copying, and cannot achieve the performance
>> of using ringbuf.
>>
>> Discussion
>> ----------
>>
>> 1. Do we need bpf_iter_task_file_get_fd()?
>>
>> Andrii suggested that next() should return a pointer to
>> a bpf_iter_task_file_item, which contains *file and fd.
>>
>> This is feasible, but it might compromise iterator encapsulation?
>
> I don't think so, replied on v2 ([0]). I know you saw that, I'm just
> linking it for others.
>
> [0] https://lore.kernel.org/bpf/CAEf4Bzba2N7pxPQh8_BDrVgupZdeow_3S7xSjDmsdhL19eXb3A@mail.gmail.com/
>
>
>>
>> More detailed discussion can be found at [3] [4]
>>
>> [3]: https://lore.kernel.org/bpf/CAEf4Bzbt0kh53xYZL57Nc9AWcYUKga_NQ6uUrTeU4bj8qyTLng@mail.gmail.com/
>> [4]: https://lore.kernel.org/bpf/AM6PR03MB584814D93FE3680635DE61A199562@AM6PR03MB5848.eurprd03.prod.outlook.com/
>>
>> What should we do? Maybe more discussion is needed?
>>
>> 2. Where should we put CRIB related kfuncs?
>>
>> I totally agree that most of the CRIB related kfuncs are not
>> CRIB specific.
>>
>> The goal of CRIB is to collect all relevant information about a process,
>> which means we need to add kfuncs involving several different kernel
>> subsystems (though these kfuncs are not complex and many just help the
>> bpf program reach a certain data structure).
>>
>> But here is a question, where should these CRIB kfuncs be placed?
>> There doesn't seem to be a suitable file to put them in.
>>
>> My current idea is to create a crib folder and then create new files for
>> the relevant subsystems, e.g. crib/files.c, crib/socket.c, crib/mount.c
>> etc. Putting them in the same folder makes it easier to maintain
>> them centrally.
>>
>> If anyone else wants to use CRIB kfuncs, welcome to use them.
>>
>
> CRIB is just one of possible applications of such kfuncs, so I'd steer
> away from over-specifying it as CRIB.
>
> task_file open-coded iterator is generic, and should stay close to
> other task iterator code, as you do in this revision.
>
> bpf_get_file_ops_type() is unnecessary, as we already discussed on v2,
> __ksym and comparison is the way to go here.
>
> bpf_fget_task(), if VFS folks agree to add it, probably will have to
> stay close to other similar VFS helpers.
>
Yes, I agree.
Maybe we should put it in fs/bpf_fs_kfuncs.c?
>> Signed-off-by: Juntong Deng <juntong.deng@...look.com>
>> ---
>> v2 -> v3:
>> 1. Move task_file open-coded iterator to kernel/bpf/helpers.c.
>>
>> 2. Fix duplicate error code 7 in test_bpf_iter_task_file().
>>
>> 3. Add comment for case when bpf_iter_task_file_get_fd() returns -1.
>>
>> 4. Add future plans in commit message of "Add struct file related
>> CRIB kfuncs".
>>
>> 5. Add Discussion section to cover letter.
>>
>> v1 -> v2:
>> Fix a type definition error in the fd parameter of
>> bpf_fget_task() at crib_common.h.
>>
>> Juntong Deng (4):
>> bpf/crib: Introduce task_file open-coded iterator kfuncs
>> selftests/bpf: Add tests for open-coded style process file iterator
>> bpf/crib: Add struct file related CRIB kfuncs
>> selftests/bpf: Add tests for struct file related CRIB kfuncs
>>
>> kernel/bpf/Makefile | 1 +
>> kernel/bpf/crib/Makefile | 3 +
>> kernel/bpf/crib/crib.c | 28 ++++
>> kernel/bpf/crib/files.c | 54 ++++++++
>> kernel/bpf/helpers.c | 4 +
>> kernel/bpf/task_iter.c | 96 +++++++++++++
>> tools/testing/selftests/bpf/prog_tests/crib.c | 126 ++++++++++++++++++
>> .../testing/selftests/bpf/progs/crib_common.h | 25 ++++
>> .../selftests/bpf/progs/crib_files_failure.c | 108 +++++++++++++++
>> .../selftests/bpf/progs/crib_files_success.c | 119 +++++++++++++++++
>> 10 files changed, 564 insertions(+)
>> create mode 100644 kernel/bpf/crib/Makefile
>> create mode 100644 kernel/bpf/crib/crib.c
>> create mode 100644 kernel/bpf/crib/files.c
>> create mode 100644 tools/testing/selftests/bpf/prog_tests/crib.c
>> create mode 100644 tools/testing/selftests/bpf/progs/crib_common.h
>> create mode 100644 tools/testing/selftests/bpf/progs/crib_files_failure.c
>> create mode 100644 tools/testing/selftests/bpf/progs/crib_files_success.c
>>
>> --
>> 2.39.5
>>
Powered by blists - more mailing lists