lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DCCCRYEUVJWZ.2AUDA0DXK0XSF@kernel.org>
Date: Tue, 26 Aug 2025 14:20:14 +0200
From: "Danilo Krummrich" <dakr@...nel.org>
To: "Vitaly Wool" <vitaly.wool@...sulko.se>
Cc: <rust-for-linux@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
 "Uladzislau Rezki" <urezki@...il.com>, "Alice Ryhl" <aliceryhl@...gle.com>,
 "Vlastimil Babka" <vbabka@...e.cz>, "Lorenzo Stoakes"
 <lorenzo.stoakes@...cle.com>, "Liam R . Howlett" <Liam.Howlett@...cle.com>,
 "Miguel Ojeda" <ojeda@...nel.org>, "Alex Gaynor" <alex.gaynor@...il.com>,
 "Boqun Feng" <boqun.feng@...il.com>, "Gary Guo" <gary@...yguo.net>, "Bjorn
 Roy Baron" <bjorn3_gh@...tonmail.com>, "Benno Lossin" <lossin@...nel.org>,
 "Andreas Hindborg" <a.hindborg@...nel.org>, "Trevor Gross"
 <tmgross@...ch.edu>, "Johannes Weiner" <hannes@...xchg.org>, "Yosry Ahmed"
 <yosry.ahmed@...ux.dev>, "Nhat Pham" <nphamcs@...il.com>,
 <linux-mm@...ck.org>
Subject: Re: [PATCH v4 2/2] rust: zpool: add abstraction for zpool drivers

On Sat Aug 23, 2025 at 3:05 PM CEST, Vitaly Wool wrote:
> +/// Zpool API.
> +///
> +/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
> +/// Such drivers implement memory storage pools in accordance with the zpool API.
> +///
> +/// # Example
> +///
> +/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
> +/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
> +/// denominator) is saved in the least significant bits of the handle, which is guaranteed
> +/// to be at least 2**6 aligned by kmalloc.
> +///
> +/// ```
> +/// use core::ptr::{NonNull, copy_nonoverlapping};
> +/// use core::sync::atomic::{AtomicU64, Ordering};
> +/// use kernel::alloc::{Flags, KBox, KVec, NumaNode};
> +/// use kernel::page::PAGE_SHIFT;
> +/// use kernel::prelude::EINVAL;
> +/// use kernel::zpool::*;
> +///
> +/// struct MyZpool {
> +///     name: &'static CStr,
> +///     bytes_used: AtomicU64,
> +/// }
> +///
> +/// struct MyZpoolDriver;
> +///
> +/// impl ZpoolDriver for MyZpoolDriver {
> +///     type Pool = KBox<MyZpool>;
> +///
> +///     fn create(name: &'static CStr, gfp: Flags) -> Result<KBox<MyZpool>> {
> +///         let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
> +///         let pool = KBox::new(my_pool, gfp)?;
> +///
> +///         Ok(pool)
> +///     }
> +///
> +///     fn destroy(p: KBox<MyZpool>) {
> +///         drop(p);
> +///     }
> +///
> +///     fn malloc(pool: &mut MyZpool, size: usize, gfp: Flags, _nid: NumaNode) -> Result<usize> {
> +///         let mut pow: usize = 0;
> +///         for n in 6..=PAGE_SHIFT {
> +///             if size <= 1 << n {
> +///                 pow = n;
> +///                 break;
> +///             }
> +///         }

Why not just use next_power_of_two()? I think the same logic could also be
achieved with

	size.next_power_of_two().trailing_zeros().max(6).min(PAGE_SHIFT)

> +///         match pow {
> +///             0 => Err(EINVAL),
> +///             _ => {
> +///                 let vec = KVec::<u64>::with_capacity(1 << (pow - 3), gfp)?;

Why use u64 and 1 << (pow - 3), rather than simply u8 and 1 << pow?

(Btw. you could also just use VBox<u8; PAGE_SIZE>::new_uninit() for all
allocations to keep the example simple.)

> +///                 let (ptr, _len, _cap) = vec.into_raw_parts();
> +///                 pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
> +///                 Ok(ptr as usize | (pow - 6))
> +///             }
> +///         }
> +///     }
> +///
> +///     unsafe fn free(pool: &MyZpool, handle: usize) {
> +///         let n = (handle & 0x3F) + 3;
> +///         let uptr = handle & !0x3F;
> +///
> +///         // SAFETY:
> +///         // - uptr comes from handle which points to the KVec allocation from `alloc`.

That's not true, you modified the pointer you got from KVec. Please explain why
it is always safe to use lower 6 bits for something else.

What does "`alloc`" refer to?

NIT: `uptr`, `KVec`

> +///         // - size == capacity and is coming from the first 6 bits of handle.
> +///         let vec = unsafe { KVec::<u64>::from_raw_parts(uptr as *mut u64, 1 << n, 1 << n) };

Why do you set the length (not the capacity) of the Vector to 1 << n? I think
technically it doesn't matter, but you should explain that in the safety
comment.

> +///         drop(vec);
> +///         pool.bytes_used.fetch_sub(1 << (n + 3), Ordering::Relaxed);
> +///     }
> +///
> +///     unsafe fn read_begin(_pool: &MyZpool, handle: usize) -> NonNull<u8> {
> +///         let uptr = handle & !0x3F;
> +///         // SAFETY: uptr points to a memory area allocated by KVec

Please use markdown and end sentences with a period. (Applies to the entire
file.)

> +///         unsafe { NonNull::new_unchecked(uptr as *mut u8) }
> +///     }
> +///
> +///     unsafe fn read_end(_pool: &MyZpool, _handle: usize, _handle_mem: NonNull<u8>) {}
> +///
> +///     unsafe fn write(_p: &MyZpool, handle: usize, handle_mem: NonNull<u8>, mem_len: usize) {
> +///         let uptr = handle & !0x3F;
> +///         // SAFETY: handle_mem is a valid non-null pointer provided by zpool, uptr points to
> +///         // a KVec allocated in `malloc` and is therefore also valid.
> +///         unsafe {
> +///             copy_nonoverlapping(handle_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
> +///         };
> +///     }
> +///
> +///     fn total_pages(pool: &MyZpool) -> u64 {
> +///         pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT

I'm not sure what the semantic of this function is; the documentation says
"Get the number of pages used by the `pool`".

However, given that you give out allocations from a kmalloc() bucket in
malloc(), this pool might be backed by more pages than what you calculate here.

So, what is done here is calculating the number of pages you could fill with
the memory that is kept around, but not the number of backing pages you consume
memory from.

Using VBox<u8; PAGE_SIZE>::new_uninit() for all allocations might simplify this.

> +///     }
> +/// }
> +/// ```
> +///
> +pub trait ZpoolDriver {
> +    /// Opaque Rust representation of `struct zpool`.
> +    type Pool: ForeignOwnable;
> +
> +    /// Create a pool.
> +    fn create(name: &'static CStr, gfp: Flags) -> Result<Self::Pool>;
> +
> +    /// Destroy the pool.
> +    fn destroy(pool: Self::Pool);
> +
> +    /// Allocate an object of size `size` bytes from `pool`, with the allocation flags `gfp` and

"of `size` bytes"

> +    /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
> +    fn malloc(
> +        pool: <Self::Pool as ForeignOwnable>::BorrowedMut<'_>,
> +        size: usize,
> +        gfp: Flags,
> +        nid: NumaNode,
> +    ) -> Result<usize>;
> +
> +    /// Free a previously allocated from the `pool` object, represented by `handle`.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `handle` must be a valid handle previously returned by `malloc`.
> +    /// - `handle` must not be used any more after the call to `free`.
> +    unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: usize);
> +
> +    /// Make all the necessary preparations for the caller to be able to read from the object
> +    /// represented by `handle` and return a valid pointer to the `handle` memory to be read.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `handle` must be a valid handle previously returned by `malloc`.
> +    /// - `read_end` with the same `handle` must be called for each `read_begin`.

What can potentially happen if we don't? I.e. how is this different from
malloc()?

> +    unsafe fn read_begin(
> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> +        handle: usize,
> +    ) -> NonNull<u8>;
> +
> +    /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
> +    /// previously returned by `read_begin`.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `handle` must be a valid handle previously returned by `malloc`.
> +    /// - `handle_mem` must be the pointer previously returned by `read_begin`.
> +    unsafe fn read_end(
> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> +        handle: usize,
> +        handle_mem: NonNull<u8>,
> +    );
> +
> +    /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
> +    /// to the memory to copy data from, and `mem_len` defines the length of the data block to
> +    /// be copied.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `handle` must be a valid handle previously returned by `malloc`.
> +    /// - `handle_mem` must be a valid pointer to an allocated memory area.

"must be a valid pointer into the allocated memory aread represented by
`handle`"

> +    /// - `handle_mem` + `mem_len` must not point outside the allocated memory area.
> +    unsafe fn write(
> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> +        handle: usize,
> +        handle_mem: NonNull<u8>,
> +        mem_len: usize,
> +    );
> +
> +    /// Get the number of pages used by the `pool`.
> +    fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
> +}
> +
> +/// An "adapter" for the registration of zpool drivers.
> +pub struct Adapter<T: ZpoolDriver>(T);
> +
> +impl<T: ZpoolDriver> Adapter<T> {
> +    extern "C" fn create_(name: *const c_uchar, gfp: u32) -> *mut c_void {
> +        // SAFETY: the memory pointed to by name is guaranteed by zpool to be a valid string

What about the lifetime of the string? In the abstraction you assume 'static,
how is this guaranteed?

> +        let pool = unsafe { T::create(CStr::from_char_ptr(name), Flags::from_raw(gfp)) };
> +        match pool {
> +            Err(_) => null_mut(),
> +            Ok(p) => T::Pool::into_foreign(p),
> +        }
> +    }

Please add an empty line in between function definitions.

> +    extern "C" fn destroy_(pool: *mut c_void) {
> +        // SAFETY: The pointer originates from an `into_foreign` call.
> +        T::destroy(unsafe { T::Pool::from_foreign(pool) })
> +    }
> +    extern "C" fn malloc_(
> +        pool: *mut c_void,
> +        size: usize,
> +        gfp: u32,
> +        handle: *mut usize,
> +        nid: c_int,
> +    ) -> c_int {
> +        // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
> +        // `from_foreign`, then that happens in `_destroy` which will not be called during this
> +        // method.
> +        let pool = unsafe { T::Pool::borrow_mut(pool) };

Wait, can't this happen concurrently to all the other functions that borrow the
pool? This would be undefined behavior, no?

> +        from_result(|| {
> +            let real_nid = match nid {
> +                bindings::NUMA_NO_NODE => NumaNode::NO_NODE,
> +                _ => NumaNode::new(nid)?,
> +            };
> +            let h = T::malloc(pool, size, Flags::from_raw(gfp), real_nid)?;
> +            // SAFETY: handle is guaranteed to be a valid pointer by zpool.
> +            unsafe { *handle = h };
> +            Ok(0)
> +        })
> +    }
> +    extern "C" fn free_(pool: *mut c_void, handle: usize) {
> +        // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
> +        // `from_foreign`, then that happens in `_destroy` which will not be called during this
> +        // method.
> +        let pool = unsafe { T::Pool::borrow(pool) };
> +
> +        // SAFETY: the caller (zswap) guarantees that `handle` is a valid handle previously

Why does this mention zwap here and in the other functions below?

> +        // allocated by `malloc`.
> +        unsafe { T::free(pool, handle) }
> +    }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ