linux-kernel - Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250225210228.GA1801922@joelnvbox>
Date: Tue, 25 Feb 2025 16:02:28 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: Danilo Krummrich <dakr@...nel.org>
Cc: Alexandre Courbot <acourbot@...dia.com>,
	Dave Airlie <airlied@...il.com>, Gary Guo <gary@...yguo.net>,
	Joel Fernandes <joel@...lfernandes.org>,
	Boqun Feng <boqun.feng@...il.com>,
	John Hubbard <jhubbard@...dia.com>, Ben Skeggs <bskeggs@...dia.com>,
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org,
	nouveau@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
	paulmck@...nel.org, Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice
 implementation

On Tue, Feb 25, 2025 at 05:09:35PM +0100, Danilo Krummrich wrote:
> On Tue, Feb 25, 2025 at 10:52:41AM -0500, Joel Fernandes wrote:
> > 
> > 
> > On 2/24/2025 6:44 PM, Danilo Krummrich wrote:
> > > On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes wrote:
> > >> Hi Danilo,
> > >>
> > >> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich wrote:
> > >>> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich wrote:
> > >>>> CC: Gary
> > >>>>
> > >>>> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> > >>>>> This inability to sleep while we are accessing registers seems very
> > >>>>> constraining to me, if not dangerous. It is pretty common to have
> > >>>>> functions intermingle hardware accesses with other operations that might
> > >>>>> sleep, and this constraint means that in such cases the caller would
> > >>>>> need to perform guard lifetime management manually:
> > >>>>>
> > >>>>>   let bar_guard = bar.try_access()?;
> > >>>>>   /* do something non-sleeping with bar_guard */
> > >>>>>   drop(bar_guard);
> > >>>>>
> > >>>>>   /* do something that might sleep */
> > >>>>>
> > >>>>>   let bar_guard = bar.try_access()?;
> > >>>>>   /* do something non-sleeping with bar_guard */
> > >>>>>   drop(bar_guard);
> > >>>>>
> > >>>>>   ...
> > >>>>>
> > >>>>> Failure to drop the guard potentially introduces a race condition, which
> > >>>>> will receive no compile-time warning and potentialy not even a runtime
> > >>>>> one unless lockdep is enabled. This problem does not exist with the
> > >>>>> equivalent C code AFAICT
> > >>>
> > >>> Without klint [1] it is exactly the same as in C, where I have to remember to
> > >>> not call into something that might sleep from atomic context.
> > >>>
> > >>
> > >> Sure, but in C, a sequence of MMIO accesses don't need to be constrained to
> > >> not sleeping?
> > > 
> > > It's not that MMIO needs to be constrained to not sleeping in Rust either. It's
> > > just that the synchronization mechanism (RCU) used for the Revocable type
> > > implies that.
> > > 
> > > In C we have something that is pretty similar with drm_dev_enter() /
> > > drm_dev_exit() even though it is using SRCU instead and is specialized to DRM.
> > > 
> > > In DRM this is used to prevent accesses to device resources after the device has
> > > been unplugged.
> > 
> > Thanks a lot for the response. Might it make more sense to use SRCU then? The
> > use of RCU seems overly restrictive due to the no-sleep-while-guard-held thing.
> 
> Allowing to hold on to the guard for too long is a bit contradictive to the goal
> of detecting hotunplug I guess.
> 
> Besides that I don't really see why we can't just re-acquire it after we sleep?
> Rust provides good options to implement it ergonimcally I think.
> 
> > 
> > Another colleague told me RDMA also uses SRCU for a similar purpose as well.
> 
> See the reasoning against SRCU from Sima [1], what's the reasoning of RDMA?
> 
> [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/

Hmm, so you're saying SRCU sections blocking indefinitely is a concern as per
that thread. But I think SRCU GPs should not be stalled in normal operation.
If it is, that is a bug anyway. Stalling SRCU grace periods is not really a
good thing anyway, you could run out of memory (even though stalling RCU is
even more dangerous).

For RDMA, I will ask Jason Gunthorpe to chime in, I CC'd him. Jason, correct
me if I'm wrong about the RDMA user but this is what I recollect discussing
with you.

> > 
> > >> I am fairly new to rust, could you help elaborate more about why these MMIO
> > >> accesses need to have RevocableGuard in Rust? What problem are we trying to
> > >> solve that C has but Rust doesn't with the aid of a RCU read-side section? I
> > >> vaguely understand we are trying to "wait for an MMIO access" using
> > >> synchronize here, but it is just a guest.
> > > 
> > > Similar to the above, in Rust it's a safety constraint to prevent MMIO accesses
> > > to unplugged devices.
> > > 
> > > The exact type in Rust in this case is Devres<pci::Bar>. Within Devres, the
> > > pci::Bar is placed in a Revocable. The Revocable is revoked when the device
> > > is detached from the driver (for instance because it has been unplugged).
> > 
> > I guess the Devres concept of revoking resources on driver detach is not a rust
> > thing (even for PCI)... but correct me if I'm wrong.
> 
> I'm not sure what you mean with that, can you expand a bit?

I was reading the devres documentation earlier. It mentios that one of its
use is to clean up resources. Maybe I mixed up the meaning of "clean up" and
"revoke" as I was reading it.

Honestly, I am still confused a bit by the difference between "revoking" and
"cleaning up".

> > 
> > > By revoking the Revocable, the pci::Bar is dropped, which implies that it's also
> > > unmapped; a subsequent call to try_access() would fail.
> > > 
> > > But yes, if the device is unplugged while holding the RCU guard, one is on their
> > > own; that's also why keeping the critical sections short is desirable.
> > 
> > I have heard some concern around whether Rust is changing the driver model when
> > it comes to driver detach / driver remove.  Can you elaborate may be a bit about
> > how Rust changes that mechanism versus C, when it comes to that?
> 
> I think that one is simple, Rust does *not* change the driver model.
> 
> What makes you think so?

Well, the revocable concept for one is rust-only right?

It is also possibly just some paranoia based on discussions, but I'm not sure
at the moment.

> > Ideally we
> > would not want Rust drivers to have races with user space accesses when they are
> > detached/remove. But we also don't want accesses to be non-sleepable sections
> > where this guard is held, it seems restrictive (though to your point the
> > sections are expected to be small).
> 
> In the very extreme case, nothing prevents you from implementing a wrapper like:
> 
> 	fn my_write32(bar: &Devres<pci::Bar>, offset: usize) -> Result<u32> {
> 		let bar = bar.try_access()?;
> 		bar.read32(offset);
> 	}
> 
> Which limits the RCU read side critical section to my_write32().
> 
> Similarly you can have custom functions for short sequences of I/O ops, or use
> closures. I don't understand the concern.

Yeah, this is certainly possible. I think one concern is similar to what you
raised on the other thread you shared [1]:
"Maybe we even want to replace it with SRCU entirely to ensure that drivers
can't stall the RCU grace period for too long by accident."

[1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/

thanks,

 - Joel