[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAADnVQ+Mn0KEJEiL79RXm2-1XCihqd+H61xxTqju9q6iANF4zA@mail.gmail.com>
Date: Wed, 6 Nov 2024 17:09:26 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Juntong Deng <juntong.deng@...look.com>
Cc: Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Kumar Kartikeya Dwivedi <memxor@...il.com>, snorcht@...il.com,
Christian Brauner <brauner@...nel.org>, bpf <bpf@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux-Fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH bpf-next v3 1/4] bpf/crib: Introduce task_file open-coded
iterator kfuncs
On Wed, Nov 6, 2024 at 2:31 PM Juntong Deng <juntong.deng@...look.com> wrote:
>
> On 2024/11/6 21:31, Alexei Starovoitov wrote:
> > On Wed, Nov 6, 2024 at 11:39 AM Juntong Deng <juntong.deng@...look.com> wrote:
> >>
> >> This patch adds the open-coded iterator style process file iterator
> >> kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all
> >> files opened by the specified process.
> >
> > This is ok.
> >
> >> In addition, this patch adds bpf_iter_task_file_get_fd() getter to get
> >> the file descriptor corresponding to the file in the current iteration.
> >
> > Unnecessary. Use CORE to read iter internal fields.
> >
> >> The reference to struct file acquired by the previous
> >> bpf_iter_task_file_next() is released in the next
> >> bpf_iter_task_file_next(), and the last reference is released in the
> >> last bpf_iter_task_file_next() that returns NULL.
> >>
> >> In the bpf_iter_task_file_destroy(), if the iterator does not iterate to
> >> the end, then the last struct file reference is released at this time.
> >>
> >> Signed-off-by: Juntong Deng <juntong.deng@...look.com>
> >> ---
> >> kernel/bpf/helpers.c | 4 ++
> >> kernel/bpf/task_iter.c | 96 ++++++++++++++++++++++++++++++++++++++++++
> >> 2 files changed, 100 insertions(+)
> >>
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index 395221e53832..1f0f7ca1c47a 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -3096,6 +3096,10 @@ BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
> >> BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
> >> BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> >> +BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_task_file_get_fd)
> >> +BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index 5af9e130e500..32e15403a5a6 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -1031,6 +1031,102 @@ __bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
> >> {
> >> }
> >>
> >> +struct bpf_iter_task_file {
> >> + __u64 __opaque[3];
> >> +} __aligned(8);
> >> +
> >> +struct bpf_iter_task_file_kern {
> >> + struct task_struct *task;
> >> + struct file *file;
> >> + int fd;
> >> +} __aligned(8);
> >> +
> >> +/**
> >> + * bpf_iter_task_file_new() - Initialize a new task file iterator for a task,
> >> + * used to iterate over all files opened by a specified task
> >> + *
> >> + * @it: the new bpf_iter_task_file to be created
> >> + * @task: a pointer pointing to a task to be iterated over
> >> + */
> >> +__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it,
> >> + struct task_struct *task)
> >> +{
> >> + struct bpf_iter_task_file_kern *kit = (void *)it;
> >> +
> >> + BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file));
> >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) !=
> >> + __alignof__(struct bpf_iter_task_file));
> >> +
> >> + kit->task = task;
> >
> > This is broken, since task refcnt can drop while iter is running.
> >
> > Before doing any of that I'd like to see a long term path for crib.
> > All these small additions are ok if they're generic and useful elsewhere.
> > I'm afraid there is no path forward for crib itself though.
> >
> > pw-bot: cr
>
> Thanks for your reply.
>
> The long-term path of CRIB is consistent with the initial goal, adding
> kfuncs to help the bpf program obtain process-related information.
>
> I think most of the CRIB kfuncs are generic, such as process file
> iterator, skb iterator, bpf_fget_task() that gets struct file based on
> file descriptor, etc.
>
> This is because obtaining process-related information is not a
> requirement specific to checkpoint/restore scenarios, but is
> required in other scenarios as well.
>
> Here I would like to quote your vision on LPC 2022 [0] [1].
:)
The reading part via iterators and access to kernel internals is fine,
but to complete CRIB idea the restore side is necessary and
for that bit I haven't heard a complete story that would be
acceptable upstream. At LPC the proposal was to add kfuncs
that will write into kernel data structures.
That part won't fly, since I don't see how one can make such
writing kfuncs safe. Restoring a socket, tcp connection, etc
is not a trivial process.
Just building a set of generic abstractions for reading is ok-ish,
since they're generic and reusable, but the end to end story
is necessary before we proceed.
Powered by blists - more mailing lists