lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZuefFhomRvqtFzSn@dread.disaster.area>
Date: Mon, 16 Sep 2024 12:59:34 +1000
From: Dave Chinner <david@...morbit.com>
To: Alice Ryhl <aliceryhl@...gle.com>
Cc: Miguel Ojeda <ojeda@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Qi Zheng <zhengqi.arch@...edance.com>,
	Roman Gushchin <roman.gushchin@...ux.dev>,
	Muchun Song <muchun.song@...ux.dev>,
	Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <benno.lossin@...ton.me>,
	Andreas Hindborg <a.hindborg@...sung.com>,
	Trevor Gross <tmgross@...ch.edu>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH] rust: shrinker: add shrinker abstraction

On Thu, Sep 12, 2024 at 09:54:01AM +0000, Alice Ryhl wrote:
> Rust Binder holds incoming transactions in a read-only mmap'd region
> where it manually manages the pages. These pages are only in use until
> the incoming transaction is consumed by userspace, but the kernel will
> keep the pages around for future transactions. Rust Binder registers a
> shrinker with the kernel so that it can give back these pages if the
> system comes under memory pressure.
> 
> Separate types are provided for registered and unregistered shrinkers.
> The unregistered shrinker type can be used to configure the shrinker
> before registering it. Separating it into two types also enables the
> user to construct the private data between the calls to `shrinker_alloc`
> and `shrinker_register` and avoid constructing the private data if
> allocating the shrinker fails.

This seems a bit nasty. It appears to me that the code does an
unsafe copy of the internal shrinker state between the unregistered
and registered types. Shrinkers have additional internal state when
SHRINKER_MEMCG_AWARE and/or SHRINKER_NUMA_AWARE flags are set,
and this abstraction doesn't seem to handle either those flags or
the internal state they use at all.

....

> diff --git a/rust/kernel/shrinker.rs b/rust/kernel/shrinker.rs
> new file mode 100644
> index 000000000000..9af726bfe0b1
> --- /dev/null
> +++ b/rust/kernel/shrinker.rs
> @@ -0,0 +1,324 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +// Copyright (C) 2024 Google LLC.
> +
> +//! Shrinker for handling memory pressure.
> +//!
> +//! C header: [`include/linux/shrinker.h`](srctree/include/linux/shrinker.h)
> +
> +use crate::{alloc::AllocError, bindings, c_str, str::CStr, types::ForeignOwnable};
> +
> +use core::{
> +    ffi::{c_int, c_ulong, c_void},
> +    marker::PhantomData,
> +    ptr::NonNull,
> +};
> +
> +const SHRINK_STOP: c_ulong = bindings::SHRINK_STOP as c_ulong;
> +const SHRINK_EMPTY: c_ulong = bindings::SHRINK_EMPTY as c_ulong;
> +
> +/// The default value for the number of seeks needed to recreate an object.
> +pub const DEFAULT_SEEKS: u32 = bindings::DEFAULT_SEEKS;
> +
> +/// An unregistered shrinker.
> +///
> +/// This type can be used to modify the settings of the shrinker before it is registered.
> +///
> +/// # Invariants
> +///
> +/// The `shrinker` pointer references an unregistered shrinker.
> +pub struct UnregisteredShrinker {
> +    shrinker: NonNull<bindings::shrinker>,
> +}
> +
> +// SAFETY: Moving an unregistered shrinker between threads is okay.
> +unsafe impl Send for UnregisteredShrinker {}
> +// SAFETY: An unregistered shrinker is thread safe.
> +unsafe impl Sync for UnregisteredShrinker {}
> +
> +impl UnregisteredShrinker {
> +    /// Create a new shrinker.
> +    pub fn alloc(name: &CStr) -> Result<Self, AllocError> {
> +        // SAFETY: Passing `0` as flags is okay. Using `%s` as the format string is okay when we
> +        // pass a nul-terminated string as the string for `%s` to print.
> +        let ptr =
> +            unsafe { bindings::shrinker_alloc(0, c_str!("%s").as_char_ptr(), name.as_char_ptr()) };

This passes flags as 0, so doesn't support SHRINKER_MEMCG_AWARE or
SHRINKER_NUMA_AWARE shrinker variants. These flags should be
passed through here.

> +
> +        let shrinker = NonNull::new(ptr).ok_or(AllocError)?;
> +
> +        // INVARIANT: The creation of the shrinker was successful.
> +        Ok(Self { shrinker })
> +    }
> +
> +    /// Create a new shrinker using format arguments for the name.
> +    pub fn alloc_fmt(name: core::fmt::Arguments<'_>) -> Result<Self, AllocError> {
> +        // SAFETY: Passing `0` as flags is okay. Using `%pA` as the format string is okay when we
> +        // pass a `fmt::Arguments` as the value to print.
> +        let ptr = unsafe {
> +            bindings::shrinker_alloc(
> +                0,

Same again.

> +                c_str!("%pA").as_char_ptr(),
> +                &name as *const _ as *const c_void,
> +            )
> +        };
> +
> +        let shrinker = NonNull::new(ptr).ok_or(AllocError)?;
> +
> +        // INVARIANT: The creation of the shrinker was successful.
> +        Ok(Self { shrinker })
> +    }
> +
> +    /// Set the number of seeks needed to recreate an object.
> +    pub fn set_seeks(&mut self, seeks: u32) {
> +        unsafe { (*self.shrinker.as_ptr()).seeks = seeks as c_int };
> +    }

Seems kinda weird to have a separate function for setting seeks,
when...

> +
> +    /// Register the shrinker.
> +    ///
> +    /// The provided pointer is used as the private data, and the type `T` determines the callbacks
> +    /// that the shrinker will use.
> +    pub fn register<T: Shrinker>(self, private_data: T::Ptr) -> RegisteredShrinker<T> {

.... all the other data needed to set up a shrinker is provided to
this function....

> +        let shrinker = self.shrinker;
> +        let ptr = shrinker.as_ptr();
> +
> +        // The destructor of `self` calls `shrinker_free`, so skip the destructor.
> +        core::mem::forget(self);
> +
> +        let private_data_ptr = <T::Ptr as ForeignOwnable>::into_foreign(private_data);
> +
> +        // SAFETY: We own the private data, so we can assign to it.
> +        unsafe { (*ptr).private_data = private_data_ptr.cast_mut() };
> +        // SAFETY: The shrinker is not yet registered, so we can update this field.
> +        unsafe { (*ptr).count_objects = Some(rust_count_objects::<T>) };
> +        // SAFETY: The shrinker is not yet registered, so we can update this field.
> +        unsafe { (*ptr).scan_objects = Some(rust_scan_objects::<T>) };

.... and implemented exactly the same way.

.....

> +/// How many objects are there in the cache?
> +///
> +/// This is used as the return value of [`Shrinker::count_objects`].
> +pub struct CountObjects {
> +    inner: c_ulong,
> +}
> +
> +impl CountObjects {
> +    /// Indicates that the number of objects is unknown.
> +    pub const UNKNOWN: Self = Self { inner: 0 };
> +
> +    /// Indicates that the number of objects is zero.
> +    pub const EMPTY: Self = Self {
> +        inner: SHRINK_EMPTY,
> +    };
> +
> +    /// The maximum possible number of freeable objects.
> +    pub const MAX: Self = Self {
> +        // The shrinker code assumes that it can multiply this value by two without overflow.
> +        inner: c_ulong::MAX / 2,
> +    };
> +
> +    /// Creates a new `CountObjects` with the given value.
> +    pub fn from_count(count: usize) -> Self {
> +        if count == 0 {
> +            return Self::EMPTY;
> +        }

No. A return count of 0 is not the same as a return value of
SHRINK_EMPTY.

A return value of 0 means "no reclaim work can be done right now".

This implies that there are objects that can be reclaimed in the near
future, but right now they are unavailable for reclaim. This can be
due to a trylock protecting the count operation failing, the all the
objects in the cache being dirty and needing work to be done before
they can be reclaimed, etc.

A return value of SHRINK_EMPTY means "there are no reclaimable
objects at all".

This implies that the cache is empty - it has absolutely nothing in
it that can be reclaimed either now or in the near future.


These two return values are treated very differently by the high
level code. SHRINK_EMPTY is used by shrink_slab_memcg() to maintain
a "shrinker needs to run" bit in the memcg shrinker search bitmap.
The bit is cleared when SHRINK_EMPTY is returned, meaning the
shrinker will not get called again until a new object is queued
to it's LRU. This sets the "shrinker needs to run" bit again, and
so the shrinker will run next time shrink_slab_memcg() is called.

In constrast, a return value of zero (i.e. no work to be done right
now) does not change the "shrinker needs to run" state bit, and
hence it will always be called the next time the shrink_slab_memcg()
is run to try to reclaim objects from that memcg shrinker...

.....

> +impl ScanObjects {
> +    /// Indicates that the shrinker should stop trying to free objects from this cache due to
> +    /// potential deadlocks.
> +    pub const STOP: Self = Self { inner: SHRINK_STOP };
> +
> +    /// The maximum possible number of freeable objects.
> +    pub const MAX: Self = Self {
> +        // The shrinker code assumes that it can multiply this value by two without overflow.
> +        inner: c_ulong::MAX / 2,

No it doesn't. This is purely a count of objects freed by the
shrinker.

> +    };
> +
> +    /// Creates a new `CountObjects` with the given value.
> +    pub fn from_count(count: usize) -> Self {
> +        if count > Self::MAX.inner as usize {
> +            return Self::MAX;
> +        }
> +
> +        Self {
> +            inner: count as c_ulong,
> +        }
> +    }
> +}
> +
> +/// This struct is used to pass information from page reclaim to the shrinkers.
> +pub struct ShrinkControl<'a> {
> +    ptr: NonNull<bindings::shrink_control>,
> +    _phantom: PhantomData<&'a bindings::shrink_control>,
> +}
> +
> +impl<'a> ShrinkControl<'a> {
> +    /// Create a `ShrinkControl` from a raw pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// The pointer should point at a valid `shrink_control` for the duration of 'a.
> +    pub unsafe fn from_raw(ptr: *mut bindings::shrink_control) -> Self {
> +        Self {
> +            // SAFETY: Caller promises that this pointer is valid.
> +            ptr: unsafe { NonNull::new_unchecked(ptr) },
> +            _phantom: PhantomData,
> +        }
> +    }
> +
> +    /// Determines whether it is safe to recurse into filesystem code.
> +    pub fn gfp_fs(&self) -> bool {
> +        // SAFETY: Okay by type invariants.
> +        let mask = unsafe { (*self.ptr.as_ptr()).gfp_mask };
> +
> +        (mask & bindings::__GFP_FS) != 0
> +    }

This needs a check for __GFP_IO as well, as there are block layer
shrinkers that need to be GFP_NOIO aware...

> +
> +    /// Returns the number of objects that `scan_objects` should try to reclaim.
> +    pub fn nr_to_scan(&self) -> usize {
> +        // SAFETY: Okay by type invariants.
> +        unsafe { (*self.ptr.as_ptr()).nr_to_scan as usize }
> +    }
> +
> +    /// The callback should set this value to the number of objects processed.
> +    pub fn set_nr_scanned(&mut self, val: usize) {
> +        let mut val = val as c_ulong;
> +        // SAFETY: Okay by type invariants.
> +        let max = unsafe { (*self.ptr.as_ptr()).nr_to_scan };
> +        if val > max {
> +            val = max;
> +        }

No. Shrinkers are allowed to scan more objects in a batch than
the high level code asked them to scan. If they do this, they
*must* report back the count of all the objects they scanned so the
batched scan accounting can adjust the remaining amount of work that
needs to be done appropriately.

> +
> +        // SAFETY: Okay by type invariants.
> +        unsafe { (*self.ptr.as_ptr()).nr_scanned = val };
> +    }
> +}
> +
> +unsafe extern "C" fn rust_count_objects<T: Shrinker>(
> +    shrink: *mut bindings::shrinker,
> +    sc: *mut bindings::shrink_control,
> +) -> c_ulong {
> +    // SAFETY: We own the private data, so we can access it.
> +    let private = unsafe { (*shrink).private_data };
> +    // SAFETY: This function is only used with shrinkers where `T` is the type of the private data.
> +    let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) };
> +    // SAFETY: The caller passes a valid `sc` pointer.
> +    let sc = unsafe { ShrinkControl::from_raw(sc) };
> +
> +    let ret = T::count_objects(private, sc);
> +    ret.inner
> +}

Why are there two assignments to "private" here? And for the one
that is borrowing a reference, why is it needed as the shrinker is
supposed to hold a reference, right? Also, where is that reference
dropped - it's not at all obvious to me (as a relative n00b to rust)
what is going on here.


-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ