[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAADnVQKu6Q1ePFuxxSLNsm-xggZbUEmWb_Y=4zeU54aAt5o6HA@mail.gmail.com>
Date: Fri, 13 Jun 2025 12:51:30 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: Andrii Nakryiko <andrii@...nel.org>, io-uring@...r.kernel.org,
Martin KaFai Lau <martin.lau@...ux.dev>, bpf <bpf@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC v2 5/5] io_uring/bpf: add basic kfunc helpers
On Fri, Jun 13, 2025 at 9:11 AM Pavel Begunkov <asml.silence@...il.com> wrote:
>
> On 6/13/25 01:25, Alexei Starovoitov wrote:
> > On Thu, Jun 12, 2025 at 6:25 AM Pavel Begunkov <asml.silence@...il.com> wrote:
> ...>>>> +BTF_ID_FLAGS(func, bpf_io_uring_extract_next_cqe, KF_RET_NULL);
> >>>> +BTF_KFUNCS_END(io_uring_kfunc_set)
> >>>
> >>> This is not safe in general.
> >>> The verifier doesn't enforce argument safety here.
> >>> As a minimum you need to add KF_TRUSTED_ARGS flag to all kfunc.
> >>> And once you do that you'll see that the verifier
> >>> doesn't recognize the cqe returned from bpf_io_uring_get_cqe*()
> >>> as trusted.
> >>
> >> Thanks, will add it. If I read it right, without the flag the
> >> program can, for example, create a struct io_ring_ctx on stack,
> >> fill it with nonsense and pass to kfuncs. Is that right?
> >
> > No. The verifier will only allow a pointer to struct io_ring_ctx
> > to be passed, but it may not be fully trusted.
> >
> > The verifier has 3 types of pointers to kernel structures:
> > 1. ptr_to_btf_id
> > 2. ptr_to_btf_id | trusted
> > 3. ptr_to_btf_id | untrusted
> >
> > 1st was added long ago for tracing and gradually got adopted
> > for non-tracing needs, but it has a foot gun, since
> > all pointer walks keep ptr_to_btf_id type.
> > It's fine in some cases to follow pointers, but not in all.
> > Hence 2nd variant was added and there
> > foo->bar dereference needs to be explicitly allowed
> > instead of allowed by default like for 1st kind.
> >
> > All loads through 1 and 3 are implemented as probe_read_kernel.
> > while loads from 2 are direct loads.
> >
> > So kfuncs without KF_TRUSTED_ARGS with struct io_ring_ctx *ctx
> > argument are likely fine and safe, since it's impossible
> > to get this io_ring_ctx pointer by dereferencing some other pointer.
> > But better to tighten safety from the start.
> > We recommend KF_TRUSTED_ARGS for all kfuncs and
> > eventually it will be the default.
>
> Sure, I'll add it, thanks for the explanation
>
> ...>> diff --git a/io_uring/bpf.c b/io_uring/bpf.c
> >> index 9494e4289605..400a06a74b5d 100644
> >> --- a/io_uring/bpf.c
> >> +++ b/io_uring/bpf.c
> >> @@ -2,6 +2,7 @@
> >> #include <linux/bpf_verifier.h>
> >>
> >> #include "io_uring.h"
> >> +#include "memmap.h"
> >> #include "bpf.h"
> >> #include "register.h"
> >>
> >> @@ -72,6 +73,14 @@ struct io_uring_cqe *bpf_io_uring_extract_next_cqe(struct io_ring_ctx *ctx)
> >> return cqe;
> >> }
> >>
> >> +__bpf_kfunc
> >> +void *bpf_io_uring_get_region(struct io_ring_ctx *ctx, u64 size__retsz)
> >> +{
> >> + if (size__retsz > ((u64)ctx->ring_region.nr_pages << PAGE_SHIFT))
> >> + return NULL;
> >> + return io_region_get_ptr(&ctx->ring_region);
> >> +}
> >
> > and bpf prog should be able to read/write anything in
> > [ctx->ring_region->ptr, ..ptr + size] region ?
>
> Right, and it's already rw mmap'ed into the user space.
>
> > Populating (creating) dynptr is probably better.
> > See bpf_dynptr_from*()
> >
> > but what is the lifetime of that memory ?
>
> It's valid within a single run of the callback but shouldn't cross
> into another invocation. Specifically, it's protected by the lock,
> but that can be tuned. Does that match with what PTR_TO_MEM expects?
yes. PTR_TO_MEM lasts for duration of the prog.
> I can add refcounting for longer term pinning, maybe to store it
> as a bpf map or whatever is the right way, but I'd rather avoid
> anything expensive in the kfunc as that'll likely be called on
> every program run.
yeah. let's not add any refcounting.
It sounds like you want something similar to
__bpf_kfunc __u8 *
hid_bpf_get_data(struct hid_bpf_ctx *ctx, unsigned int offset, const
size_t rdwr_buf_size)
we have a special hack for it already in the verifier.
The argument need to be called rdwr_buf_size,
then it will be used to establish the range of PTR_TO_MEM.
It has to be run-time constant.
What you're proposing with "__retsz" is a cleaner version of the same.
But consider bpf_dynptr_from_io_uring(struct io_ring_ctx *ctx)
it can create a dynamically sized region,
and later use bpf_dynptr_slice_rdwr() to get writeable chunk of it.
I feel that __retsz approach may actually be a better fit at the end,
if you're ok with constant arg.
Powered by blists - more mailing lists