lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <713667D6-8001-408D-819D-E9326FC3AFD5@collabora.com>
Date: Wed, 27 Aug 2025 17:41:15 -0300
From: Daniel Almeida <daniel.almeida@...labora.com>
To: Sidong Yang <sidong.yang@...iosa.ai>
Cc: Jens Axboe <axboe@...nel.dk>,
 Caleb Sander Mateos <csander@...estorage.com>,
 Benno Lossin <lossin@...nel.org>,
 Miguel Ojeda <ojeda@...nel.org>,
 Arnd Bergmann <arnd@...db.de>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 rust-for-linux@...r.kernel.org,
 linux-kernel@...r.kernel.org,
 io-uring@...r.kernel.org
Subject: Re: [RFC PATCH v3 3/5] rust: io_uring: introduce rust abstraction for
 io-uring cmd

Hi Sidong,

> On 22 Aug 2025, at 09:55, Sidong Yang <sidong.yang@...iosa.ai> wrote:
> 
> Implment the io-uring abstractions needed for miscdevicecs and other
> char devices that have io-uring command interface.

Can you expand on this last part?

> 
> * `io_uring::IoUringCmd` : Rust abstraction for `io_uring_cmd` which
>  will be used as arg for `MiscDevice::uring_cmd()`. And driver can get
>  `cmd_op` sent from userspace. Also it has `flags` which includes option
>  that is reissued.
> 

This is a bit hard to parse.

> * `io_uring::IoUringSqe` : Rust abstraction for `io_uring_sqe` which
>  could be get from `IoUringCmd::sqe()` and driver could get `cmd_data`
>  from userspace. Also `IoUringSqe` has more data like opcode could be used in
>  driver.

Same here.

> 
> Signed-off-by: Sidong Yang <sidong.yang@...iosa.ai>
> ---
> rust/kernel/io_uring.rs | 306 ++++++++++++++++++++++++++++++++++++++++
> rust/kernel/lib.rs      |   1 +
> 2 files changed, 307 insertions(+)
> create mode 100644 rust/kernel/io_uring.rs
> 
> diff --git a/rust/kernel/io_uring.rs b/rust/kernel/io_uring.rs
> new file mode 100644
> index 000000000000..61e88bdf4e42
> --- /dev/null
> +++ b/rust/kernel/io_uring.rs
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// SPDX-FileCopyrightText: (C) 2025 Furiosa AI
> +
> +//! Abstractions for io-uring.
> +//!
> +//! This module provides types for implements io-uring interface for char device.

This is also hard to parse.

> +//!
> +//!
> +//! C headers: [`include/linux/io_uring/cmd.h`](srctree/include/linux/io_uring/cmd.h) and
> +//! [`include/linux/io_uring/io_uring.h`](srctree/include/linux/io_uring/io_uring.h)
> +
> +use core::{mem::MaybeUninit, pin::Pin};
> +
> +use crate::error::from_result;
> +use crate::transmute::{AsBytes, FromBytes};
> +use crate::{fs::File, types::Opaque};
> +
> +use crate::prelude::*;
> +
> +/// io-uring opcode

/// `IoUring` opcodes.

Notice:

a) The capitalization,
b) The use of backticks,
c) The period in the end.

This is an ongoing effort to keep the docs tidy :)

> +pub mod opcode {
> +    /// opcode for uring cmd
> +    pub const URING_CMD: u32 = bindings::io_uring_op_IORING_OP_URING_CMD;
> +}

Should this be its own type? This way we can use the type system to enforce
that only valid opcodes are used where an opcode is expected.

> +
> +/// A Rust abstraction for the Linux kernel's `io_uring_cmd` structure.

/// A Rust abstraction for `io_uring_cmd`.

> +///
> +/// This structure is a safe, opaque wrapper around the raw C `io_uring_cmd`
> +/// binding from the Linux kernel. It represents a command structure used
> +/// in io_uring operations within the kernel.

This code will also be part of the kernel, so mentioning “the Linux kernel” is superfluous.

> +/// This type is used internally by the io_uring subsystem to manage
> +/// asynchronous I/O commands.
> +///
> +/// This type should not be constructed or manipulated directly by
> +/// kernel module developers.

“…by drivers”.

> +///
> +/// # INVARIANT

/// # Invariants

> +/// - `self.inner` always points to a valid, live `bindings::io_uring_cmd`.

Blank here

> +#[repr(transparent)]
> +pub struct IoUringCmd {
> +    /// An opaque wrapper containing the actual `io_uring_cmd` data.
> +    inner: Opaque<bindings::io_uring_cmd>,
> +}
> +
> +impl IoUringCmd {
> +    /// Returns the cmd_op with associated with the `io_uring_cmd`.

This sentence does not parse very well.

> +    #[inline]
> +    pub fn cmd_op(&self) -> u32 {
> +        // SAFETY: `self.inner` is guaranteed by the type invariant to point
> +        // to a live `io_uring_cmd`, so dereferencing is safe.
> +        unsafe { (*self.inner.get()).cmd_op }

Perhaps add an as_raw() method so this becomes:

unsafe {*self.as_raw()}.cmd_op

> +    }
> +
> +    /// Returns the flags with associated with the `io_uring_cmd`.

With the command, or something like that. The user doesn’t see the raw
bindings::io_uring_cmd so we shouldn’t be mentioning it if we can help it.

> +    #[inline]
> +    pub fn flags(&self) -> u32 {
> +        // SAFETY: `self.inner` is guaranteed by the type invariant to point
> +        // to a live `io_uring_cmd`, so dereferencing is safe.
> +        unsafe { (*self.inner.get()).flags }
> +    }
> +
> +    /// Reads protocol data unit as `T` that impl `FromBytes` from uring cmd

This sentence does not parse very well.

> +    ///
> +    /// Fails with [`EFAULT`] if size of `T` is bigger than pdu size.

/// # Errors

> +    #[inline]
> +    pub fn read_pdu<T: FromBytes>(&self) -> Result<T> {

This takes &self,

> +        // SAFETY: `self.inner` is guaranteed by the type invariant to point
> +        // to a live `io_uring_cmd`, so dereferencing is safe.
> +        let inner = unsafe { &mut *self.inner.get() };

But this creates a &mut to self.inner using unsafe code. Avoid doing this in
general. All of a sudden your type is not thread-safe anymore.

If you need to mutate &self here, then take &mut self as an argument.

> +
> +        let len = size_of::<T>();
> +        if len > inner.pdu.len() {
> +            return Err(EFAULT);

EFAULT? How about EINVAL?

> +        }
> +
> +        let mut out: MaybeUninit<T> = MaybeUninit::uninit();
> +        let ptr = &raw mut inner.pdu as *const c_void;
> +
> +        // SAFETY:
> +        // * The `ptr` is valid pointer from `self.inner` that is guaranteed by type invariant.
> +        // * The `out` is valid pointer that points `T` which impls `FromBytes` and checked
> +        //   size of `T` is smaller than pdu size.
> +        unsafe {
> +            core::ptr::copy_nonoverlapping(ptr, out.as_mut_ptr().cast::<c_void>(), len);

I don’t think you need to manually specify c_void here.

Benno, can’t we use core::mem::zeroed() or something like that to avoid this unsafe?

The input was zeroed in prep() and the output can just be a zeroed T on the
stack, unless I missed something?

> +        }
> +
> +        // SAFETY: The read above has initialized all bytes in `out`, and since `T` implements
> +        // `FromBytes`, any bit-pattern is a valid value for this type.
> +        Ok(unsafe { out.assume_init() })
> +    }
> +
> +    /// Writes the provided `value` to `pdu` in uring_cmd `self`

Writes the provided `value` to `pdu`.

> +    ///

/// # Errors
///

> +    /// Fails with [`EFAULT`] if size of `T` is bigger than pdu size.

> +    #[inline]
> +    pub fn write_pdu<T: AsBytes>(&mut self, value: &T) -> Result<()> {
> +        // SAFETY: `self.inner` is guaranteed by the type invariant to point
> +        // to a live `io_uring_cmd`, so dereferencing is safe.
> +        let inner = unsafe { &mut *self.inner.get() };
> +
> +        let len = size_of::<T>();
> +        if len > inner.pdu.len() {
> +            return Err(EFAULT);
> +        }
> +
> +        let src = (value as *const T).cast::<c_void>();

as_ptr().cast()

> +        let dst = &raw mut inner.pdu as *mut c_void;

(&raw mut inner.pdu).cast()

> +
> +        // SAFETY:
> +        // * The `src` is points valid memory that is guaranteed by `T` impls `AsBytes`
> +        // * The `dst` is valid. It's from `self.inner` that is guaranteed by type invariant.
> +        // * It's safe to copy because size of `T` is no more than len of pdu.
> +        unsafe {
> +            core::ptr::copy_nonoverlapping(src, dst, len);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Constructs a new [`IoUringCmd`] from a raw `io_uring_cmd`

Missing period.

> +    ///
> +    /// # Safety
> +    ///
> +    /// The caller must guarantee that:
> +    /// - `ptr` is non-null, properly aligned, and points to a valid
> +    ///   `bindings::io_uring_cmd`.

Blanks for every bullet point, please.

> +    /// - The pointed-to memory remains initialized and valid for the entire
> +    ///   lifetime `'a` of the returned reference.
> +    /// - While the returned `Pin<&'a mut IoUringCmd>` is alive, the underlying
> +    ///   object is **not moved** (pinning requirement).

They can’t move an !Unpin type in safe code.

> +    /// - **Aliasing rules:** the returned `&mut` has **exclusive** access to the same
> +    ///   object for its entire lifetime:

You really don’t need to emphasize these.

> +    ///   - No other `&mut` **or** `&` references to the same `io_uring_cmd` may be
> +    ///     alive at the same time.

This and the point above are identical.

> +    ///   - There must be no concurrent reads/writes through raw pointers, FFI, or
> +    ///     other kernel paths to the same object during this lifetime.

This and the first point say the same thing.

> +    ///   - If the object can be touched from other contexts (e.g. IRQ/another CPU),
> +    ///     the caller must provide synchronization to uphold this exclusivity.

I am not sure what you mean.
> +    /// - This function relies on `IoUringCmd` being `repr(transparent)` over
> +    ///   `bindings::io_uring_cmd` so the cast preserves layout.

This is not a safety requirement.

Just adapt the requirements from other instances of from_raw(), they all
convert a *mut T to a &T so the safety requirements are similar.

> +    #[inline]
> +    pub unsafe fn from_raw<'a>(ptr: *mut bindings::io_uring_cmd) -> Pin<&'a mut IoUringCmd> {

Why is this pub? Sounds like a massive footgun? This should be private or at
best pub(crate).


> +        // SAFETY:
> +        // * The caller guarantees that the pointer is not dangling and stays
> +        //   valid for the duration of 'a.
> +        // * The cast is okay because `IoUringCmd` is `repr(transparent)` and
> +        //   has the same memory layout as `bindings::io_uring_cmd`.
> +        // * The returned `Pin` ensures that the object cannot be moved, which
> +        //   is required because the kernel may hold pointers to this memory
> +        //   location and moving it would invalidate those pointers.

> +        unsafe { Pin::new_unchecked(&mut *ptr.cast()) }
> +    }
> +
> +    /// Returns the file that referenced by uring cmd self.
> +    #[inline]
> +    pub fn file(&self) -> &File {
> +        // SAFETY: `self.inner` is guaranteed by the type invariant to point
> +        // to a live `io_uring_cmd`, so dereferencing is safe.
> +        let file = unsafe { (*self.inner.get()).file };
> +
> +        // SAFETY:
> +        // * The `file` points valid file.

Why?

> +        // * refcount is positive after submission queue entry issued.
> +        // * There is no active fdget_pos region on the file on this thread.
> +        unsafe { File::from_raw_file(file) }
> +    }
> +
> +    /// Returns an reference to the [`IoUringSqe`] associated with this command.

s/an/a

> +    #[inline]
> +    pub fn sqe(&self) -> &IoUringSqe {
> +        // SAFETY: `self.inner` is guaranteed by the type invariant to point
> +        // to a live `io_uring_cmd`, so dereferencing is safe.
> +        let sqe = unsafe { (*self.inner.get()).sqe };
> +        // SAFETY: The call guarantees that the `sqe` points valid io_uring_sqe.

What do you mean by “the call guarantees” ?

> +        unsafe { IoUringSqe::from_raw(sqe) }
> +    }
> +
> +    /// Completes an this [`IoUringCmd`] request that was previously queued.

This sentence does not parse very well.

> +    ///
> +    /// # Safety
> +    ///
> +    /// - This function must be called **only** for a command whose `uring_cmd`

Please no emphasis.

> +    ///   handler previously returned **`-EIOCBQUEUED`** to io_uring.

To what? Are you referring to a Rust type, or to the C part of the kernel?

> +    ///
> +    /// # Parameters
> +    ///
> +    /// - `ret`: Result to return to userspace.
> +    /// - `res2`: Extra for big completion queue entry `IORING_SETUP_CQE32`.

This sentence does not parse very well. Also, can you rename this?

> +    /// - `issue_flags`: Flags associated with this request, typically the same
> +    ///   as those passed to the `uring_cmd` handler.
> +    #[inline]
> +    pub fn done(self: Pin<&mut IoUringCmd>, ret: Result<i32>, res2: u64, issue_flags: u32) {
> +        let ret = from_result(|| ret) as isize;

What does this do?

> +        // SAFETY: The call guarantees that `self.inner` is not dangling and stays valid

What do you mean “the call” ?

> +        unsafe {
> +            bindings::io_uring_cmd_done(self.inner.get(), ret, res2, issue_flags);
> +        }
> +    }
> +}
> +
> +/// A Rust abstraction for the Linux kernel's `io_uring_sqe` structure.

Please don’t mention the words “Linux kernel” here either.

> +///
> +/// This structure is a safe, opaque wrapper around the raw C [`io_uring_sqe`](srctree/include/uapi/linux/io_uring.h)

This line needs to be wrapped.

> +/// binding from the Linux kernel. It represents a Submission Queue Entry

Can you link somewhere here? Perhaps there’s docs for “Submission Queue
Entry”.

> +/// used in io_uring operations within the kernel.
> +///
> +/// # Type Safety
> +///
> +/// The `#[repr(transparent)]` attribute ensures that this wrapper has
> +/// the same memory layout as the underlying `io_uring_sqe` structure,
> +/// allowing it to be safely transmuted between the two representations.

This is an invariant, please move it there.

> +///
> +/// # Fields
> +///
> +/// * `inner` - An opaque wrapper containing the actual `io_uring_sqe` data.
> +///             The `Opaque` type prevents direct access to the internal
> +///             structure fields, ensuring memory safety and encapsulation.

Inline docs please.

> +///
> +/// # Usage

I don’t think we specifically need to mention “# Usage”.

> +///
> +/// This type represents a submission queue entry that describes an I/O

You can start with “Represents a…”. No need to say “this type” here.

> +/// operation to be executed by the io_uring subsystem. It contains
> +/// information such as the operation type, file descriptor, buffer
> +/// pointers, and other operation-specific data.
> +///
> +/// Users can obtain this type from [`IoUringCmd::sqe()`] method, which
> +/// extracts the submission queue entry associated with a command.
> +///
> +/// This type should not be constructed or manipulated directly by
> +/// kernel module developers.

By drivers.

> +///
> +/// # INVARIANT

/// # Invariants

> +/// - `self.inner` always points to a valid, live `bindings::io_uring_sqe`.
> +#[repr(transparent)]
> +pub struct IoUringSqe {
> +    inner: Opaque<bindings::io_uring_sqe>,
> +}
> +
> +impl IoUringSqe {
> +    /// Reads and interprets the `cmd` field of an `bindings::io_uring_sqe` as a value of type `T`.
> +    ///
> +    /// # Safety & Invariants

Safety section for a safe function.

> +    /// - Construction of `T` is delegated to `FromBytes`, which guarantees that `T` has no
> +    ///   invalid bit patterns and can be safely reconstructed from raw bytes.
> +    /// - **Limitation:** This implementation does not support `IORING_SETUP_SQE128` (larger SQE entries).

Please no emphasis.


> +    ///   Only the standard `io_uring_sqe` layout is handled here.
> +    ///
> +    /// # Errors

Blank here.

> +    /// * Returns `EINVAL` if the `self` does not hold a `opcode::URING_CMD`.
> +    /// * Returns `EFAULT` if the command buffer is smaller than the requested type `T`.
> +    ///
> +    /// # Returns

I don’t think we need a specific section for this. Just write this in
normal prose please.


> +    /// * On success, returns a `T` deserialized from the `cmd`.
> +    /// * On failure, returns an appropriate error as described above.
> +    pub fn cmd_data<T: FromBytes>(&self) -> Result<T> {
> +        // SAFETY: `self.inner` guaranteed by the type invariant to point
> +        // to a live `io_uring_sqe`, so dereferencing is safe.
> +        let sqe = unsafe { &*self.inner.get() };
> +
> +        if u32::from(sqe.opcode) != opcode::URING_CMD {
> +            return Err(EINVAL);
> +        }
> +
> +        // SAFETY: Accessing the `sqe.cmd` union field is safe because we've
> +        // verified that `sqe.opcode == IORING_OP_URING_CMD`, which guarantees
> +        // that this union variant is initialized and valid.
> +        let cmd = unsafe { sqe.__bindgen_anon_6.cmd.as_ref() };
> +        let cmd_len = size_of_val(&sqe.__bindgen_anon_6.bindgen_union_field);
> +
> +        if cmd_len < size_of::<T>() {
> +            return Err(EFAULT);

EINVAL

> +        }
> +
> +        let cmd_ptr = cmd.as_ptr() as *mut T;

cast()

> +
> +        // SAFETY: `cmd_ptr` is valid from `self.inner` which is guaranteed by
> +        // type variant. And also it points to initialized `T` from userspace.

“Invariant”.

“[…] an initialized T”.


> +        let ret = unsafe { core::ptr::read_unaligned(cmd_ptr) };
> +
> +        Ok(ret)
> +    }
> +
> +    /// Constructs a new `IoUringSqe` from a raw `io_uring_sqe`.

[`IoUringSqe`].

Please build the docs and make sure all your docs look nice.

> +    ///
> +    /// # Safety
> +    ///
> +    /// The caller must guarantee that:
> +    /// - `ptr` is non-null, properly aligned, and points to a valid initialized
> +    ///   `bindings::io_uring_sqe`.
> +    /// - The pointed-to memory remains valid (not freed or repurposed) for the
> +    ///   entire lifetime `'a` of the returned reference.
> +    /// - **Aliasing rules (for `&T`):** while the returned `&'a IoUringSqe` is
> +    ///   alive, there must be **no mutable access** to the same object through any
> +    ///   path (no `&mut`, no raw-pointer writes, no FFI/IRQ/other-CPU writers).
> +    ///   Multiple `&` is fine **only if all of them are read-only** for the entire
> +    ///   overlapping lifetime.
> +    /// - This relies on `IoUringSqe` being `repr(transparent)` over
> +    ///   `bindings::io_uring_sqe`, so the cast preserves layout.

Please rewrite this entire section given the feedback I gave higher up in this
patch.

> +    #[inline]
> +    pub unsafe fn from_raw<'a>(ptr: *const bindings::io_uring_sqe) -> &'a IoUringSqe {

Private or pub(crate) at best.

> +        // SAFETY: The caller guarantees that the pointer is not dangling and stays valid for the
> +        // duration of 'a. The cast is okay because `IoUringSqe` is `repr(transparent)` and has the
> +        // same memory layout as `bindings::io_uring_sqe`.
> +        unsafe { &*ptr.cast() }
> +    }
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index ed53169e795c..d38cf7137401 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -91,6 +91,7 @@
> pub mod fs;
> pub mod init;
> pub mod io;
> +pub mod io_uring;
> pub mod ioctl;
> pub mod jump_label;
> #[cfg(CONFIG_KUNIT)]
> -- 
> 2.43.0
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ