linux-kernel - Re: [PATCH v11 8/8] task: rust: rework how current is accessed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH5fLgiBsW-sZ8LW4aqK1Eux71qdB3ERr6RdE3gX2uka+GRrew@mail.gmail.com>
Date: Wed, 8 Jan 2025 13:32:28 +0100
From: Alice Ryhl <aliceryhl@...gle.com>
To: Andreas Hindborg <a.hindborg@...nel.org>
Cc: Miguel Ojeda <ojeda@...nel.org>, Matthew Wilcox <willy@...radead.org>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, 
	John Hubbard <jhubbard@...dia.com>, "Liam R. Howlett" <Liam.Howlett@...cle.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, 
	Arnd Bergmann <arnd@...db.de>, Christian Brauner <brauner@...nel.org>, Jann Horn <jannh@...gle.com>, 
	Suren Baghdasaryan <surenb@...gle.com>, Alex Gaynor <alex.gaynor@...il.com>, 
	Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, 
	Björn Roy Baron <bjorn3_gh@...tonmail.com>, 
	Benno Lossin <benno.lossin@...ton.me>, Trevor Gross <tmgross@...ch.edu>, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH v11 8/8] task: rust: rework how current is accessed

On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@...nel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@...gle.com> writes:
> > +impl CurrentTask {
> > +    /// Access the address space of the current task.
> > +    ///
> > +    /// This function does not touch the refcount of the mm.
> > +    #[inline]
> > +    pub fn mm(&self) -> Option<&MmWithUser> {
> > +        // SAFETY: The `mm` field of `current` is not modified from other threads, so reading it is
> > +        // not a data race.
> > +        let mm = unsafe { (*self.as_ptr()).mm };
> > +
> > +        if mm.is_null() {
> > +            return None;
> > +        }
> > +
> > +        // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > +        // value of `mm_users`. Furthermore, the returned `&MmWithUser` borrows from this
> > +        // `CurrentTask`, so it cannot escape the scope in which the current pointer was obtained.
> > +        //
> > +        // This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` are used. There are two
> > +        // relevant cases:
> > +        // * If the `&CurrentTask` was created before `kthread_use_mm()`, then it cannot be
> > +        //   accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` scope due to the
> > +        //   `NotThreadSafe` field of `CurrentTask`.
> > +        // * If the `&CurrentTask` was created within a `kthread_use_mm()`/`kthread_unuse_mm()`
> > +        //   scope, then the `&CurrentTask` cannot escape that scope, so the returned `&MmWithUser`
> > +        //   also cannot escape that scope.
> > +        // In either case, it's not possible to read `current->mm` and keep using it after the
> > +        // scope is ended with `kthread_unuse_mm()`.
>
> I guess we don't actually need the last section until we see
> `ktread_use_mm` / `kthread_unuse_mm` abstractions in tree?

I mean, there could be such a scope in C code that called into Rust?
And I don't think there's anything wrong with future-proofing this
abstraction towards adding it in the future.

> > +        Some(unsafe { MmWithUser::from_raw(mm) })
> > +    }
> > +
> > +    /// Access the pid namespace of the current task.
>
> Is it an address space or a memory map(ping)? Can we use consistent vocabulary?

Neither. It's a pid namespace which has nothing to do with address
spaces or memory mappings. This part of this patch is moving an
existing abstraction to work with the reworked way to access current.

> > +    ///
> > +    /// This function does not touch the refcount of the namespace or use RCU protection.
> > +    #[doc(alias = "task_active_pid_ns")]
>
> What is with the alias?

This is the Rust equivalent to the C function called
task_active_pid_ns. The alias makes it easier to find it.

> > +    #[inline]
> > +    pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
> > +        // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
> > +        // on the current task.
> > +        let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
> > +
> > +        if active_ns.is_null() {
> > +            return None;
> > +        }
> > +
> > +        // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> > +        //
> > +        // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> > +        // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
> > +        // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
> > +        // created by the calling `Task`. This invariant guarantees that after having acquired a
> > +        // reference to a `Task`'s pid namespace it will remain unchanged.
> > +        //
> > +        // When a task has exited and been reaped `release_task()` will be called. This will set
> > +        // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
> > +        // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
> > +        // referencing count to the `Task` will prevent `release_task()` being called.
> > +        //
> > +        // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
> > +        // can be used. There are two cases to consider:
> > +        //
> > +        // (1) retrieving the `PidNamespace` of the `current` task
> > +        // (2) retrieving the `PidNamespace` of a non-`current` task
> > +        //
> > +        // From system call context retrieving the `PidNamespace` for case (1) is always safe and
> > +        // requires neither RCU locking nor a reference count to be held. Retrieving the
> > +        // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
> > +        // like that is exposed to Rust.
> > +        //
> > +        // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> > +        // Accessing `PidNamespace` outside of RCU protection requires a reference count that
> > +        // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
> > +        // task means `NULL` can be returned as the non-`current` task could have already passed
> > +        // through `release_task()`.
> > +        //
> > +        // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
> > +        // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
> > +        // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
> > +        // without RCU protection and without having to acquire a reference count.
> > +        //
> > +        // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
> > +        // reference on `PidNamespace` and will return an `Option` to force the caller to
> > +        // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
> > +        // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
> > +        // difficult to perform operations that are otherwise safe without holding a reference
> > +        // count as long as RCU protection is guaranteed. But it is not important currently. But we
> > +        // do want it in the future.
> > +        //
> > +        // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
> > +        // synchronizes against putting the last reference of the associated `struct pid` of
> > +        // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
> > +        // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
> > +        // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
> > +        // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
> > +        // from `task->thread_pid` to finish.
>
> While this comment is a nice piece of documentation, I think we should
> move it elsewhere, or restrict it to paragraphs pertaining to (1), since
> that is the only case we consider here?

Where would you move it?

> > +        //
> > +        // SAFETY: If `current`'s pid ns is non-null, then it references a valid pid ns.
> > +        // Furthermore, the returned `&PidNamespace` borrows from this `CurrentTask`, so it cannot
> > +        // escape the scope in which the current pointer was obtained.
> > +        Some(unsafe { PidNamespace::from_ptr(active_ns) })
> > +    }
>
> Can we move the impl block and the struct definition next to each other?

I could move the definition of CurrentTask down, but I'm not really
convinced that it's an improvement.

Alice