[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH5fLgh6J41JkaD23_Br6nLkznAfzaivVY=xRpr1jzpUZNZgKA@mail.gmail.com>
Date: Mon, 13 Jan 2025 11:26:42 +0100
From: Alice Ryhl <aliceryhl@...gle.com>
To: Andreas Hindborg <a.hindborg@...nel.org>
Cc: Miguel Ojeda <ojeda@...nel.org>, Matthew Wilcox <willy@...radead.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Vlastimil Babka <vbabka@...e.cz>,
John Hubbard <jhubbard@...dia.com>, "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Arnd Bergmann <arnd@...db.de>, Christian Brauner <brauner@...nel.org>, Jann Horn <jannh@...gle.com>,
Suren Baghdasaryan <surenb@...gle.com>, Alex Gaynor <alex.gaynor@...il.com>,
Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Benno Lossin <benno.lossin@...ton.me>, Trevor Gross <tmgross@...ch.edu>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH v11 8/8] task: rust: rework how current is accessed
On Thu, Jan 9, 2025 at 9:42 AM Andreas Hindborg <a.hindborg@...nel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@...gle.com> writes:
>
> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@...nel.org> wrote:
> >>
> >> "Alice Ryhl" <aliceryhl@...gle.com> writes:
> >> > + #[inline]
> >> > + pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
> >> > + // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
> >> > + // on the current task.
> >> > + let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
> >> > +
> >> > + if active_ns.is_null() {
> >> > + return None;
> >> > + }
> >> > +
> >> > + // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> >> > + //
> >> > + // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> >> > + // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
> >> > + // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
> >> > + // created by the calling `Task`. This invariant guarantees that after having acquired a
> >> > + // reference to a `Task`'s pid namespace it will remain unchanged.
> >> > + //
> >> > + // When a task has exited and been reaped `release_task()` will be called. This will set
> >> > + // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
> >> > + // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
> >> > + // referencing count to the `Task` will prevent `release_task()` being called.
> >> > + //
> >> > + // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
> >> > + // can be used. There are two cases to consider:
> >> > + //
> >> > + // (1) retrieving the `PidNamespace` of the `current` task
> >> > + // (2) retrieving the `PidNamespace` of a non-`current` task
> >> > + //
> >> > + // From system call context retrieving the `PidNamespace` for case (1) is always safe and
> >> > + // requires neither RCU locking nor a reference count to be held. Retrieving the
> >> > + // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
> >> > + // like that is exposed to Rust.
> >> > + //
> >> > + // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> >> > + // Accessing `PidNamespace` outside of RCU protection requires a reference count that
> >> > + // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
> >> > + // task means `NULL` can be returned as the non-`current` task could have already passed
> >> > + // through `release_task()`.
> >> > + //
> >> > + // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
> >> > + // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
> >> > + // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
> >> > + // without RCU protection and without having to acquire a reference count.
> >> > + //
> >> > + // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
> >> > + // reference on `PidNamespace` and will return an `Option` to force the caller to
> >> > + // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
> >> > + // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
> >> > + // difficult to perform operations that are otherwise safe without holding a reference
> >> > + // count as long as RCU protection is guaranteed. But it is not important currently. But we
> >> > + // do want it in the future.
> >> > + //
> >> > + // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
> >> > + // synchronizes against putting the last reference of the associated `struct pid` of
> >> > + // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
> >> > + // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
> >> > + // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
> >> > + // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
> >> > + // from `task->thread_pid` to finish.
> >>
> >> While this comment is a nice piece of documentation, I think we should
> >> move it elsewhere, or restrict it to paragraphs pertaining to (1), since
> >> that is the only case we consider here?
> >
> > Where would you move it?
>
> The info about (2) should probably be with the implementation for that
> case, when it lands. Perhaps we can move it hen?
The function already exists. It's called Task::get_pid_ns(). I think
the comment makes sense here: get_pid_ns() is the normal case where
you don't skip synchronization, and active_pid_ns() is the special
case where you can skip RCU due to reasons. This comment explains that
normally you cannot skip RCU, but in this special case you can.
Alice
Powered by blists - more mailing lists