linux-kernel - Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250306153236.GE354511@nvidia.com>
Date: Thu, 6 Mar 2025 11:32:36 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: John Hubbard <jhubbard@...dia.com>,
	Greg KH <gregkh@...uxfoundation.org>,
	Danilo Krummrich <dakr@...nel.org>,
	Joel Fernandes <joelagnelf@...dia.com>,
	Alexandre Courbot <acourbot@...dia.com>,
	Dave Airlie <airlied@...il.com>, Gary Guo <gary@...yguo.net>,
	Joel Fernandes <joel@...lfernandes.org>,
	Boqun Feng <boqun.feng@...il.com>, Ben Skeggs <bskeggs@...dia.com>,
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org,
	nouveau@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
	paulmck@...nel.org
Subject: Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice
 implementation

On Thu, Mar 06, 2025 at 11:42:38AM +0100, Simona Vetter wrote:
> > Further, I just remembered, (Danilo please notice!) there is another
> > related issue here that DMA mappings *may not* outlive remove()
> > either. netdev had a bug related to this recently and it was all
> > agreed that it is not allowed. The kernel can crash in a couple of
> > different ways if you try to do this.
> > 
> > https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c
> 
> Hm for the physical dma I thought disabling pci bust master should put a
> stop to all this stuff?

Not in the general case. Many device classes (eg platform) don't have
something like "PCI bus master". It is also not always possible to
reset a device, even in PCI.

So the way things work today for module reload relies on the driver
duing a full quiet down so that the next driver to attach can safely
start up the device. Otherwise the next driver flips PCI bus master
back on and immediately UAFs memory through rouge DMA.

Relying on PCI Bus master also exposes a weakness we battled with in
kexec. When the new driver boots up it has to gain control of the
device and stop the DMA before flipping "PCI Bus Master" off. Almost
no drivers actually do this, and some HW can't even achieve it without
PCI reset (which is not always available). Meaning you end up with a
likely UAF flow if you rely on this technique.

> For the sw lifecycle stuff I honestly didn't know that was an issue, I
> guess that needs to be adressed in the dma-api wrappers or rust can blow
> up in funny ways. C drivers just walk all mappings and shoot them.

I wonder what people will come up with. DMA API is performance path,
people are not going to accept pointless overheads there.

IMHO whatever path the DMA API takes the MMIO design should follow
it.

> The trouble is that for real hotunplug, you need all this anyway. Because
> when you physically hotunplug the interrupts will be dead, the mmio will
> be gone any momem (not just at the beginnning of an rcu revocable
> section), so real hotunplug is worse than what we're trying to do here.

There are two kinds of real hotunplug, the friendly kind that we see
in physical PCI where you actually plonk a button on the case and wait
for the light to go off. Ie it is interactive and safe with the
OS. Very similar to module reloading.

And the hostile kind, like in thunderbolt, where it just goes away and
dies.

In the server world, other than nvme, we seem to focus on the friendly
kind.

> So randomly interrupts not happening is something you need to cope with no
> matter what.

Yes
 
> But for a driver unbind you _do_ have to worry about cleanly shutting down
> the hardware. For the above reasons and also in general putting hardware
> into a well-known (all off usually) state is better for then reloading a
> new driver version and binding that. Except that there's no way to tell
> whether your ->remove got called for unbinding or hotunplug.

IMHO it doesn't really matter, the driver has to support the most
difficult scenario anyhow. The only practical difference is that the
MMIO might return -1 to all reads and the interrupts are dead. If you
want to detect a gone PCI device then just do a register read and
check for -1, which some drivers like mlx5 are doing as part of their
resiliency strategy.

> pci device was physically unplugged or not, and so for developer
> convenience most pci drivers go with the "cleanly shut down everything"
> approach, which is the wrong thing to do for actual hotunplug.

I wouldn't say it is wrong. It is still the correct thing to do, and
following down the normal cleanup paths is a good way to ensure the
special case doesn't have bugs. The primary difference is you want to
understand the device is dead and stop waiting on it faster. Drivers
need to consider these things anyhow if they want resiliency against
device crashes, PCI link wobbles and so on that don't involve
remove().

Regardless, I think the point is clear that the driver author bears
alot of responsibility to sequence this stuff correctly as part of
their remove() implementation. The idea that Rust can magically make
all this safe against UAF or lockups seems incorrect.

> > Ah.. I guess rust would have to validate the function pointers and the
> > THIS_MODULE are consistent at runtime time before handing them off to
> > C to prevent this. Seems like a reasonable thing to put under some
> > CONFIG_DEBUG, also seems a bit hard to implement perhaps..
> 
> We should know the .text section of a module, so checking whether a
> pointer is within that shouldn't be too hard.

It is legal to pass a pointer to a function in a module that this
module is linked to as well. We do that sometimes.. Eg a fops having a
simple_xx pointer. So you'd need to do some graph analysis.

Jason