lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250227192321.GA67615@nvidia.com>
Date: Thu, 27 Feb 2025 15:23:21 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Danilo Krummrich <dakr@...nel.org>
Cc: Boqun Feng <boqun.feng@...il.com>,
	Joel Fernandes <joelagnelf@...dia.com>,
	Alexandre Courbot <acourbot@...dia.com>,
	Dave Airlie <airlied@...il.com>, Gary Guo <gary@...yguo.net>,
	Joel Fernandes <joel@...lfernandes.org>,
	John Hubbard <jhubbard@...dia.com>, Ben Skeggs <bskeggs@...dia.com>,
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org,
	nouveau@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
	paulmck@...nel.org
Subject: Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice
 implementation

On Thu, Feb 27, 2025 at 06:32:15PM +0100, Danilo Krummrich wrote:
> On Thu, Feb 27, 2025 at 08:55:09AM -0800, Boqun Feng wrote:
> > On Thu, Feb 27, 2025 at 12:17:33PM -0400, Jason Gunthorpe wrote:
> > 
> > > I still wonder why you couldn't also have these reliable reference
> > > counts rooted on the device driver instead of only on the module.
> > > 
> > 
> > You could put reliable reference counts anywhere you want, as long as it
> > reflects the resource dependencies.
> 
> Right, as I explained in a different reply, the signature for PCI driver probe()
> looks like this:
> 
> 	fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>>
> 
> The returned Pin<KBox<Self>> has the lifetime of the driver being bound to the
> device.
> 
> Which means a driver can bind things to this lifetime. But, it isn't forced to,
> it can also put things into an Arc and share it with the rest of the world.

This statement right here seems to be the fundamental problem.

The design pattern says that 'share it with the rest of the world' is
a bug. A driver following the pattern cannot do that, it must contain
the driver objects within the driver scope and free them. In C we
inspect for this manually, and check for it with kmemleak
progamatically.

It appears to me that the main issue here is that nobody has figured
out how to make rust have rules that can enforce that design pattern.

Have the compiler prevent the driver author from incorrectly extending
the lifetime of a driver-object beyond the driver's inherent scope, ie
that Self object above.

Instead we get this:

> If something is crucial to be bound to the lifetime of a driver being bound to a
> device (i.e. device resources), you have to expose it as Devres<T>.

Which creates a costly way to work around this missing design pattern
by adding runtime checks to every single access of T in all the
operational threads. Failable rcu_lock across every batch of register
access.

The reason the kernel has these design patterns of shutdown then
destroy is to avoid that runtime overhead! We optimize by swapping
fine grained locks for coarse locks that probably already exist. It is
a valid pattern, works well and has alot of APIs designed to support
it.

This giant thread started because people were objecting to the cost
and usability of the runtime checks on the operational paths.

So, I think you can say it can't be done, that the alternative is only
a little worse. Sad, but OK, but let's please acknowledge the
limitation.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ