lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b1509e7f-4817-4466-bd2b-c083f024c0d4@konsulko.se>
Date: Wed, 27 Aug 2025 22:43:09 +0200
From: Vitaly Wool <vitaly.wool@...sulko.se>
To: Danilo Krummrich <dakr@...nel.org>
Cc: rust-for-linux@...r.kernel.org, linux-kernel@...r.kernel.org,
 Uladzislau Rezki <urezki@...il.com>, Alice Ryhl <aliceryhl@...gle.com>,
 Vlastimil Babka <vbabka@...e.cz>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 "Liam R . Howlett" <Liam.Howlett@...cle.com>, Miguel Ojeda
 <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
 Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
 Bjorn Roy Baron <bjorn3_gh@...tonmail.com>, Benno Lossin
 <lossin@...nel.org>, Andreas Hindborg <a.hindborg@...nel.org>,
 Trevor Gross <tmgross@...ch.edu>, Johannes Weiner <hannes@...xchg.org>,
 Yosry Ahmed <yosry.ahmed@...ux.dev>, Nhat Pham <nphamcs@...il.com>,
 linux-mm@...ck.org
Subject: Re: [PATCH v4 2/2] rust: zpool: add abstraction for zpool drivers



On 8/26/25 14:20, Danilo Krummrich wrote:
> On Sat Aug 23, 2025 at 3:05 PM CEST, Vitaly Wool wrote:
>> +/// Zpool API.
>> +///
>> +/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
>> +/// Such drivers implement memory storage pools in accordance with the zpool API.
>> +///
>> +/// # Example
>> +///
>> +/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
>> +/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
>> +/// denominator) is saved in the least significant bits of the handle, which is guaranteed
>> +/// to be at least 2**6 aligned by kmalloc.
>> +///
>> +/// ```
>> +/// use core::ptr::{NonNull, copy_nonoverlapping};
>> +/// use core::sync::atomic::{AtomicU64, Ordering};
>> +/// use kernel::alloc::{Flags, KBox, KVec, NumaNode};
>> +/// use kernel::page::PAGE_SHIFT;
>> +/// use kernel::prelude::EINVAL;
>> +/// use kernel::zpool::*;
>> +///
>> +/// struct MyZpool {
>> +///     name: &'static CStr,
>> +///     bytes_used: AtomicU64,
>> +/// }
>> +///
>> +/// struct MyZpoolDriver;
>> +///
>> +/// impl ZpoolDriver for MyZpoolDriver {
>> +///     type Pool = KBox<MyZpool>;
>> +///
>> +///     fn create(name: &'static CStr, gfp: Flags) -> Result<KBox<MyZpool>> {
>> +///         let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
>> +///         let pool = KBox::new(my_pool, gfp)?;
>> +///
>> +///         Ok(pool)
>> +///     }
>> +///
>> +///     fn destroy(p: KBox<MyZpool>) {
>> +///         drop(p);
>> +///     }
>> +///
>> +///     fn malloc(pool: &mut MyZpool, size: usize, gfp: Flags, _nid: NumaNode) -> Result<usize> {
>> +///         let mut pow: usize = 0;
>> +///         for n in 6..=PAGE_SHIFT {
>> +///             if size <= 1 << n {
>> +///                 pow = n;
>> +///                 break;
>> +///             }
>> +///         }
> 
> Why not just use next_power_of_two()? I think the same logic could also be
> achieved with
> 
> 	size.next_power_of_two().trailing_zeros().max(6).min(PAGE_SHIFT)

It indeed can, thanks :)

>> +///         match pow {
>> +///             0 => Err(EINVAL),
>> +///             _ => {
>> +///                 let vec = KVec::<u64>::with_capacity(1 << (pow - 3), gfp)?;
> 
> Why use u64 and 1 << (pow - 3), rather than simply u8 and 1 << pow?
> 
> (Btw. you could also just use VBox<u8; PAGE_SIZE>::new_uninit() for all
> allocations to keep the example simple.)

Right, that fixation on u64 doesn't help at all here.

>> +///                 let (ptr, _len, _cap) = vec.into_raw_parts();
>> +///                 pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
>> +///                 Ok(ptr as usize | (pow - 6))
>> +///             }
>> +///         }
>> +///     }
>> +///
>> +///     unsafe fn free(pool: &MyZpool, handle: usize) {
>> +///         let n = (handle & 0x3F) + 3;
>> +///         let uptr = handle & !0x3F;
>> +///
>> +///         // SAFETY:
>> +///         // - uptr comes from handle which points to the KVec allocation from `alloc`.
> 
> That's not true, you modified the pointer you got from KVec. Please explain why
> it is always safe to use lower 6 bits for something else.
> 
> What does "`alloc`" refer to?

It refers to the alloc function we implement in this toy backend for 
ZpoolDriver trait.

> NIT: `uptr`, `KVec`
> 
>> +///         // - size == capacity and is coming from the first 6 bits of handle.
>> +///         let vec = unsafe { KVec::<u64>::from_raw_parts(uptr as *mut u64, 1 << n, 1 << n) };
> 
> Why do you set the length (not the capacity) of the Vector to 1 << n? I think
> technically it doesn't matter, but you should explain that in the safety
> comment.

Would the following work: "we know the capacity of this vector and it is 
1 << n, we set the length to the same value to be deterministic, but the 
actual length of vec doesn't matter because we're dropping it right here 
anyway"?

>> +///         drop(vec);
>> +///         pool.bytes_used.fetch_sub(1 << (n + 3), Ordering::Relaxed);
>> +///     }
>> +///
>> +///     unsafe fn read_begin(_pool: &MyZpool, handle: usize) -> NonNull<u8> {
>> +///         let uptr = handle & !0x3F;
>> +///         // SAFETY: uptr points to a memory area allocated by KVec
> 
> Please use markdown and end sentences with a period. (Applies to the entire
> file.)
> 
>> +///         unsafe { NonNull::new_unchecked(uptr as *mut u8) }
>> +///     }
>> +///
>> +///     unsafe fn read_end(_pool: &MyZpool, _handle: usize, _handle_mem: NonNull<u8>) {}
>> +///
>> +///     unsafe fn write(_p: &MyZpool, handle: usize, handle_mem: NonNull<u8>, mem_len: usize) {
>> +///         let uptr = handle & !0x3F;
>> +///         // SAFETY: handle_mem is a valid non-null pointer provided by zpool, uptr points to
>> +///         // a KVec allocated in `malloc` and is therefore also valid.
>> +///         unsafe {
>> +///             copy_nonoverlapping(handle_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
>> +///         };
>> +///     }
>> +///
>> +///     fn total_pages(pool: &MyZpool) -> u64 {
>> +///         pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
> 
> I'm not sure what the semantic of this function is; the documentation says
> "Get the number of pages used by the `pool`".
> 
> However, given that you give out allocations from a kmalloc() bucket in
> malloc(), this pool might be backed by more pages than what you calculate here.

Well, maybe I need to add an explicit comment about it here, but the 
idea is that with the SLUB allocator, you have kmalloc-64, kmalloc-128, 
kmalloc-256 etc. caches which will be used to manage requests for up to 
64, 128, 256, ... bytes respectively, and in that case the calculations 
are correct. FWIW I smoke tested this allocator and the actual numbers 
seem to be consistent with these calculations.

> So, what is done here is calculating the number of pages you could fill with
> the memory that is kept around, but not the number of backing pages you consume
> memory from.
> 
> Using VBox<u8; PAGE_SIZE>::new_uninit() for all allocations might simplify this.
> 
>> +///     }
>> +/// }
>> +/// ```
>> +///
>> +pub trait ZpoolDriver {
>> +    /// Opaque Rust representation of `struct zpool`.
>> +    type Pool: ForeignOwnable;
>> +
>> +    /// Create a pool.
>> +    fn create(name: &'static CStr, gfp: Flags) -> Result<Self::Pool>;
>> +
>> +    /// Destroy the pool.
>> +    fn destroy(pool: Self::Pool);
>> +
>> +    /// Allocate an object of size `size` bytes from `pool`, with the allocation flags `gfp` and
> 
> "of `size` bytes"
> 
>> +    /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
>> +    fn malloc(
>> +        pool: <Self::Pool as ForeignOwnable>::BorrowedMut<'_>,
>> +        size: usize,
>> +        gfp: Flags,
>> +        nid: NumaNode,
>> +    ) -> Result<usize>;
>> +
>> +    /// Free a previously allocated from the `pool` object, represented by `handle`.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `handle` must not be used any more after the call to `free`.
>> +    unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: usize);
>> +
>> +    /// Make all the necessary preparations for the caller to be able to read from the object
>> +    /// represented by `handle` and return a valid pointer to the `handle` memory to be read.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `read_end` with the same `handle` must be called for each `read_begin`.
> 
> What can potentially happen if we don't? I.e. how is this different from
> malloc()?

Here the idea is that if read_begin() has some sort of extra mapping 
involved (like, doing kmap_atomic() on some weird memory address) for 
the caller to be able to read directly from the returned pointer, 
read_end() must clean that mapping up.

>> +    unsafe fn read_begin(
>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> +        handle: usize,
>> +    ) -> NonNull<u8>;
>> +
>> +    /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
>> +    /// previously returned by `read_begin`.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `handle_mem` must be the pointer previously returned by `read_begin`.
>> +    unsafe fn read_end(
>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> +        handle: usize,
>> +        handle_mem: NonNull<u8>,
>> +    );
>> +
>> +    /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
>> +    /// to the memory to copy data from, and `mem_len` defines the length of the data block to
>> +    /// be copied.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `handle_mem` must be a valid pointer to an allocated memory area.
> 
> "must be a valid pointer into the allocated memory aread represented by
> `handle`"
> 
>> +    /// - `handle_mem` + `mem_len` must not point outside the allocated memory area.
>> +    unsafe fn write(
>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> +        handle: usize,
>> +        handle_mem: NonNull<u8>,
>> +        mem_len: usize,
>> +    );
>> +
>> +    /// Get the number of pages used by the `pool`.
>> +    fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
>> +}
>> +
>> +/// An "adapter" for the registration of zpool drivers.
>> +pub struct Adapter<T: ZpoolDriver>(T);
>> +
>> +impl<T: ZpoolDriver> Adapter<T> {
>> +    extern "C" fn create_(name: *const c_uchar, gfp: u32) -> *mut c_void {
>> +        // SAFETY: the memory pointed to by name is guaranteed by zpool to be a valid string
> 
> What about the lifetime of the string? In the abstraction you assume 'static,
> how is this guaranteed?

Actually it isn't, thanks for finding this.

>> +        let pool = unsafe { T::create(CStr::from_char_ptr(name), Flags::from_raw(gfp)) };
>> +        match pool {
>> +            Err(_) => null_mut(),
>> +            Ok(p) => T::Pool::into_foreign(p),
>> +        }
>> +    }
> 
> Please add an empty line in between function definitions.
> 
>> +    extern "C" fn destroy_(pool: *mut c_void) {
>> +        // SAFETY: The pointer originates from an `into_foreign` call.
>> +        T::destroy(unsafe { T::Pool::from_foreign(pool) })
>> +    }
>> +    extern "C" fn malloc_(
>> +        pool: *mut c_void,
>> +        size: usize,
>> +        gfp: u32,
>> +        handle: *mut usize,
>> +        nid: c_int,
>> +    ) -> c_int {
>> +        // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
>> +        // `from_foreign`, then that happens in `_destroy` which will not be called during this
>> +        // method.
>> +        let pool = unsafe { T::Pool::borrow_mut(pool) };
> 
> Wait, can't this happen concurrently to all the other functions that borrow the
> pool? This would be undefined behavior, no?

Theoretically, yes, but since pool is actually Box<T>, it's only the 
inner T that is mutable.

Anyway, the only reason for malloc() to require a mutable reference is 
that the backend implementation *may* use RBTree::cursor_lower_bound() 
which requires a mutable reference of the tree.

Would it be okay if I
* change the Zpool API so that malloc takes an immutable reference
* extend the RBTree API with a cursor_lower_bound analog which doesn't 
require a mutable tree?

>> +        from_result(|| {
>> +            let real_nid = match nid {
>> +                bindings::NUMA_NO_NODE => NumaNode::NO_NODE,
>> +                _ => NumaNode::new(nid)?,
>> +            };
>> +            let h = T::malloc(pool, size, Flags::from_raw(gfp), real_nid)?;
>> +            // SAFETY: handle is guaranteed to be a valid pointer by zpool.
>> +            unsafe { *handle = h };
>> +            Ok(0)
>> +        })
>> +    }
>> +    extern "C" fn free_(pool: *mut c_void, handle: usize) {
>> +        // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
>> +        // `from_foreign`, then that happens in `_destroy` which will not be called during this
>> +        // method.
>> +        let pool = unsafe { T::Pool::borrow(pool) };
>> +
>> +        // SAFETY: the caller (zswap) guarantees that `handle` is a valid handle previously
> 
> Why does this mention zwap here and in the other functions below?

Should have been "(e. g. zswap)".

>> +        // allocated by `malloc`.
>> +        unsafe { T::free(pool, handle) }
>> +    }
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ