lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aFRJxCFxA0Hqwmqu@pollux>
Date: Thu, 19 Jun 2025 19:32:52 +0200
From: Danilo Krummrich <dakr@...nel.org>
To: Boris Brezillon <boris.brezillon@...labora.com>,
	Alice Ryhl <aliceryhl@...gle.com>,
	Daniel Almeida <daniel.almeida@...labora.com>
Cc: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
	Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <benno.lossin@...ton.me>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Trevor Gross <tmgross@...ch.edu>,
	Sumit Semwal <sumit.semwal@...aro.org>,
	Christian König <christian.koenig@....com>,
	Alyssa Rosenzweig <alyssa@...enzweig.io>,
	Lyude Paul <lyude@...hat.com>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org,
	dri-devel@...ts.freedesktop.org, Asahi Lina <lina@...hilina.net>
Subject: Re: [PATCH v2 2/2] rust: drm: Add GPUVM abstraction

On Thu, Jun 19, 2025 at 01:57:09PM +0200, Boris Brezillon wrote:
> I'm digressing, but there's an aspect I found very annoying in the C
> version of the API: the fact that we have to take a BO ref, then lock,
> then release the vm_bo [1], because otherwise the vm_bo might be the
> last owner of a BO ref leading to a UAF on the lock itself. This to me,
> denotes a lifetime issue that I think would be good to address in the
> rust version of the API.
> 
> It's not exactly the same problem, but I think it comes from the same
> root issue: lax ownership definition. By that I mean it's not clear
> who's the owner and who's the owned. gem_object::gpuva::list has
> weak refs on the vm_bos contained in this list, which kinda makes sense
> because vm_bos themselves have a ref on the gem_object, and if we were
> to make this weak ref a strong ref we'd never free any of these two
> objects. The lock is also part of the BO (either the BO resv lock, or a
> custom lock), and since it's the very same lock we use to insert/remove
> vm_bos, that's problematic.

I think the ownership is actually well defined. A struct drm_gpuvm_bo holds a
reference of its DRM GEM object.

When the drm_gpuvm_bo is freed it needs to remove itself from the GEM objects
drm_gpuvm_bo list. This also guarantees that all list entries of the GEM object
are always valid and can be safely used as long as the list lock is held.

The reference count dance you describe does not come from the ownership model
(which, again, I think is fine), but it comes from the design decision to let
drivers take all required locks externally rather than locking things
internally. Or in other words typically GPUVM functions behave like a *_locked()
variant.

The main reason for that is that GPUVM had to be designed with a certain degree
of flexibility in order to be usable for different driver use-cases. For this
reason, the external locking approach was used to avoid drivers running into
lock inversion problems.

For instance, when we use a separate lock to protect the GEM objects list of
drm_gpuvm_bo objects, we can either iterate the list of objects from a scope
where the GEM objects dma_resv lock is already held (e.g. TTM callbacks), which
is a common use-case, or execute operations that have to grab a dma_resv lock
while iterating the GEM objects list of drm_gpuvm_bo objects.

Meanwhile, I think we have some experience in how drivers typically have to use
those APIs. And I think that in this case - as long as we keep the external
locking scheme for the dma_resv lock - we should be fine to make the lock that
protects the GEM objects drm_gpuvm_bo list an internal lock in the Rust
abstractions. For this, we can rename the existing, corresponding C functions
into *_locked() variants and provide functions with internal locking for both C
drivers and the Rust abstractions.

- Danilo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ