linux-kernel - Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z73cXGkookq5-NON@cassiopeiae>
Date: Tue, 25 Feb 2025 16:06:04 +0100
From: Danilo Krummrich <dakr@...nel.org>
To: Alexandre Courbot <acourbot@...dia.com>
Cc: Dave Airlie <airlied@...il.com>, Gary Guo <gary@...yguo.net>,
	Joel Fernandes <joel@...lfernandes.org>,
	Boqun Feng <boqun.feng@...il.com>,
	John Hubbard <jhubbard@...dia.com>, Ben Skeggs <bskeggs@...dia.com>,
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org,
	nouveau@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
	Nouveau <nouveau-bounces@...ts.freedesktop.org>
Subject: Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice
 implementation

On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot wrote:
> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
> > CC: Gary
> >
> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> >> This inability to sleep while we are accessing registers seems very
> >> constraining to me, if not dangerous. It is pretty common to have
> >> functions intermingle hardware accesses with other operations that might
> >> sleep, and this constraint means that in such cases the caller would
> >> need to perform guard lifetime management manually:
> >> 
> >>   let bar_guard = bar.try_access()?;
> >>   /* do something non-sleeping with bar_guard */
> >>   drop(bar_guard);
> >> 
> >>   /* do something that might sleep */
> >> 
> >>   let bar_guard = bar.try_access()?;
> >>   /* do something non-sleeping with bar_guard */
> >>   drop(bar_guard);
> >> 
> >>   ...
> >> 
> >> Failure to drop the guard potentially introduces a race condition, which
> >> will receive no compile-time warning and potentialy not even a runtime
> >> one unless lockdep is enabled. This problem does not exist with the
> >> equivalent C code AFAICT, which makes the Rust version actually more
> >> error-prone and dangerous, the opposite of what we are trying to achieve
> >> with Rust. Or am I missing something?
> >
> > Generally you are right, but you have to see it from a different perspective.
> >
> > What you describe is not an issue that comes from the design of the API, but is
> > a limitation of Rust in the kernel. People are aware of the issue and with klint
> > [1] there are solutions for that in the pipeline, see also [2] and [3].
> >
> > [1] https://rust-for-linux.com/klint
> > [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/
> 
> Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by
> its own admission. But even if the ignore the safety issue, the other one
> (ergonomics) is still there.
> 
> Basically this way of accessing registers imposes quite a mental burden on its
> users. It requires a very different (and harsher) discipline than when writing
> the same code in C

We need similar solutions in C too, see drm_dev_enter() / drm_dev_exit() and
drm_dev_unplug().

> and the correct granularity to use is unclear to me.
> 
> For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a
> particular register in a busy loop, should I call try_access() once before the
> loop? Or every time before accessing the register?

I think we should re-acquire the guard in each iteration and drop it before the
delay. I think a simple closure would work very well for this pattern?

> I'm afraid having to check
> that the resource is still alive before accessing any register is going to
> become tedious very quickly.
> 
> I understand that we want to protect against accessing the IO region of an
> unplugged device ; but still there is no guarantee that the device won't be
> unplugged in the middle of a critical section, however short. Thus the driver
> code should be able to recognize that the device has fallen off the bus when it
> e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
> extra protection that AFAICT isn't used in C?

As mentioned above, we already do similar things in C.

Also, think about what's the alternative. If we remove the Devres wrapper of
pci::Bar, we lose the control over the lifetime of the bar mapping and it can
easily out-live the device / driver binding. This makes the API unsound.

With this drivers would be able to keep resources acquired. What if after a
hotplug the physical address region is re-used and to be mapped by another
driver?