linux-kernel - Re: [RFC v2 5/5] io_uring/bpf: add basic kfunc helpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQJgxnQEL+rtVkp7TB_qQ1JKHiXe=p48tB_-N6F+oaDLyQ@mail.gmail.com>
Date: Wed, 11 Jun 2025 19:47:50 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: io-uring@...r.kernel.org, Martin KaFai Lau <martin.lau@...ux.dev>, 
	bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC v2 5/5] io_uring/bpf: add basic kfunc helpers

On Fri, Jun 6, 2025 at 6:58 AM Pavel Begunkov <asml.silence@...il.com> wrote:
>
> A handle_events program should be able to parse the CQ and submit new
> requests, add kfuncs to cover that. The only essential kfunc here is
> bpf_io_uring_submit_sqes, and the rest are likely be removed in a
> non-RFC version in favour of a more general approach.
>
> Signed-off-by: Pavel Begunkov <asml.silence@...il.com>
> ---
>  io_uring/bpf.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 86 insertions(+)
>
> diff --git a/io_uring/bpf.c b/io_uring/bpf.c
> index f86b12f280e8..9494e4289605 100644
> --- a/io_uring/bpf.c
> +++ b/io_uring/bpf.c
> @@ -1,12 +1,92 @@
>  #include <linux/mutex.h>
>  #include <linux/bpf_verifier.h>
>
> +#include "io_uring.h"
>  #include "bpf.h"
>  #include "register.h"
>
>  static const struct btf_type *loop_state_type;
>  DEFINE_MUTEX(io_bpf_ctrl_mutex);
>
> +__bpf_kfunc_start_defs();
> +
> +__bpf_kfunc int bpf_io_uring_submit_sqes(struct io_ring_ctx *ctx,
> +                                        unsigned nr)
> +{
> +       return io_submit_sqes(ctx, nr);
> +}
> +
> +__bpf_kfunc int bpf_io_uring_post_cqe(struct io_ring_ctx *ctx,
> +                                     u64 data, u32 res, u32 cflags)
> +{
> +       bool posted;
> +
> +       posted = io_post_aux_cqe(ctx, data, res, cflags);
> +       return posted ? 0 : -ENOMEM;
> +}
> +
> +__bpf_kfunc int bpf_io_uring_queue_sqe(struct io_ring_ctx *ctx,
> +                                       void *bpf_sqe, int mem__sz)
> +{
> +       unsigned tail = ctx->rings->sq.tail;
> +       struct io_uring_sqe *sqe;
> +
> +       if (mem__sz != sizeof(*sqe))
> +               return -EINVAL;
> +
> +       ctx->rings->sq.tail++;
> +       tail &= (ctx->sq_entries - 1);
> +       /* double index for 128-byte SQEs, twice as long */
> +       if (ctx->flags & IORING_SETUP_SQE128)
> +               tail <<= 1;
> +       sqe = &ctx->sq_sqes[tail];
> +       memcpy(sqe, bpf_sqe, sizeof(*sqe));
> +       return 0;
> +}
> +
> +__bpf_kfunc
> +struct io_uring_cqe *bpf_io_uring_get_cqe(struct io_ring_ctx *ctx, u32 idx)
> +{
> +       unsigned max_entries = ctx->cq_entries;
> +       struct io_uring_cqe *cqe_array = ctx->rings->cqes;
> +
> +       if (ctx->flags & IORING_SETUP_CQE32)
> +               max_entries *= 2;
> +       return &cqe_array[idx & (max_entries - 1)];
> +}
> +
> +__bpf_kfunc
> +struct io_uring_cqe *bpf_io_uring_extract_next_cqe(struct io_ring_ctx *ctx)
> +{
> +       struct io_rings *rings = ctx->rings;
> +       unsigned int mask = ctx->cq_entries - 1;
> +       unsigned head = rings->cq.head;
> +       struct io_uring_cqe *cqe;
> +
> +       /* TODO CQE32 */
> +       if (head == rings->cq.tail)
> +               return NULL;
> +
> +       cqe = &rings->cqes[head & mask];
> +       rings->cq.head++;
> +       return cqe;
> +}
> +
> +__bpf_kfunc_end_defs();
> +
> +BTF_KFUNCS_START(io_uring_kfunc_set)
> +BTF_ID_FLAGS(func, bpf_io_uring_submit_sqes, KF_SLEEPABLE);
> +BTF_ID_FLAGS(func, bpf_io_uring_post_cqe, KF_SLEEPABLE);
> +BTF_ID_FLAGS(func, bpf_io_uring_queue_sqe, KF_SLEEPABLE);
> +BTF_ID_FLAGS(func, bpf_io_uring_get_cqe, 0);
> +BTF_ID_FLAGS(func, bpf_io_uring_extract_next_cqe, KF_RET_NULL);
> +BTF_KFUNCS_END(io_uring_kfunc_set)

This is not safe in general.
The verifier doesn't enforce argument safety here.
As a minimum you need to add KF_TRUSTED_ARGS flag to all kfunc.
And once you do that you'll see that the verifier
doesn't recognize the cqe returned from bpf_io_uring_get_cqe*()
as trusted.
Looking at your example:
https://github.com/axboe/liburing/commit/706237127f03e15b4cc9c7c31c16d34dbff37cdc
it doesn't care about contents of cqe and doesn't pass it further.
So sort-of ok-ish right now,
but if you need to pass cqe to another kfunc
you would need to add an open coded iterator for cqe-s
with appropriate KF_ITER* flags
or maybe add acquire/release semantics for cqe.
Like, get_cqe will be KF_ACQUIRE, and you'd need
matching KF_RELEASE kfunc,
so that 'cqe' is not lost.
Then 'cqe' will be trusted and you can pass it as actual 'cqe'
into another kfunc.
Without KF_ACQUIRE the verifier sees that get_cqe*() kfuncs
return 'struct io_uring_cqe *' and it's ok for tracing
or passing into kfuncs like bpf_io_uring_queue_sqe()
that don't care about a particular type,
but not ok for full tracking of objects.

For next revision please post all selftest, examples,
and bpf progs on the list,
so people don't need to search github.