lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DCEVAXNB3EL9.YFTIP5RQCTUW@nvidia.com>
Date: Fri, 29 Aug 2025 20:16:41 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Danilo Krummrich" <dakr@...nel.org>
Cc: "Miguel Ojeda" <ojeda@...nel.org>, "Alex Gaynor"
 <alex.gaynor@...il.com>, "Boqun Feng" <boqun.feng@...il.com>, "Gary Guo"
 <gary@...yguo.net>, Björn Roy Baron
 <bjorn3_gh@...tonmail.com>, "Benno Lossin" <lossin@...nel.org>, "Andreas
 Hindborg" <a.hindborg@...nel.org>, "Alice Ryhl" <aliceryhl@...gle.com>,
 "Trevor Gross" <tmgross@...ch.edu>, "David Airlie" <airlied@...il.com>,
 "Simona Vetter" <simona@...ll.ch>, "Maarten Lankhorst"
 <maarten.lankhorst@...ux.intel.com>, "Maxime Ripard" <mripard@...nel.org>,
 "Thomas Zimmermann" <tzimmermann@...e.de>, "John Hubbard"
 <jhubbard@...dia.com>, "Alistair Popple" <apopple@...dia.com>, "Joel
 Fernandes" <joelagnelf@...dia.com>, "Timur Tabi" <ttabi@...dia.com>,
 <rust-for-linux@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
 <nouveau@...ts.freedesktop.org>, <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare
 the GSP firmware

On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote:
> On 8/26/25 6:07 AM, Alexandre Courbot wrote:
>>   /// Structure encapsulating the firmware blobs required for the GPU to operate.
>>   #[expect(dead_code)]
>>   pub(crate) struct Firmware {
>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware {
>>       booter_unloader: BooterFirmware,
>>       /// GSP bootloader, verifies the GSP firmware before loading and running it.
>>       gsp_bootloader: RiscvFirmware,
>> -    gsp: firmware::Firmware,
>> +    /// GSP firmware.
>> +    gsp: Pin<KBox<GspFirmware>>,
>
> Is there a reason why we don't just propagate it through struct Gpu, which uses 
> pin-init already?
>
> You can make Firmware pin_data too and then everything is within the single 
> allocation of struct Gpu.

I tried doing that at first, and hit the problem that the `impl PinInit`
returned by `GspFirmware::new` borrows a reference to the GSP firmware
binary loaded by `Firmware::new` - when `Firmware::new` returns, the
firmware gets freed, and the borrow checker complains.

We could move the GSP firmware loading code into the `pin_init!` of
`Firmware::new`, but now we hit another problem: in `Gpu::new` the
following code is executed:

    FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)?

which requires the `Firmware` instance, which doesn't exist yet as the
`Gpu` object isn't initialized until the end of the method.

So we could move `FbLayout`, and everything else created by `Gpu::new`
to become members of the `Gpu` instance. It does make sense actually:
this `new` method is doing a lot of stuff, such as running FRTS, and
with Alistair's series it even runs Booter, the sequencer and so on.
Maybe we should move all firmware execution to a separate method that is
called by `probe` after the `Gpu` is constructed, as right now the `Gpu`
constructor looks like it does a bit more than it should.

... but even when doing that, `Firmware::new` and `FbLayout::new` still
require a reference to the `Bar`, and... you get the idea. :)

So I don't think the current design allows us to do that easily or at
all, and even if it does, it will be at a significant cost in code
clarity. There is also the fact that I am considering making the
firmware member of `Gpu` a trait object: the boot sequence is so
different between pre and post-Hopper that I don't think it makes sense
to share the same `Firmware` structure between the two. I would rather
see `Firmware` as an opaque trait object, which provides high-level
methods such as "start GSP" behind which the specifics of each GPU
family are hidden. If we go with this design, `Firmware` will become a
trait object and so cannot be pinned into `Gpu`.

This doesn't change my observation that `Gpu::new` should not IMHO do
steps like booting the GSP - it should just acquire the resources it
needs, return the pinned GPU object, and then `probe` can continue the
boot sequence. Having the GPU object pinned and constructed early
simplifies things quite a bit as the more we progress with boot, the
harder it becomes to construct everything in place (and the `PinInit`
closure also becomes more and more complex).

I'm still laying down the general design, but I'm pretty convinced that
having `Firmware` as a trait object is the right way to abstract the
differences between GPU families.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ