[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <DCH9YHZ4ZCC7.36PEB9YXWP0E6@nvidia.com>
Date: Mon, 01 Sep 2025 16:11:03 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Danilo Krummrich" <dakr@...nel.org>
Cc: "Miguel Ojeda" <ojeda@...nel.org>, "Alex Gaynor"
<alex.gaynor@...il.com>, "Boqun Feng" <boqun.feng@...il.com>, "Gary Guo"
<gary@...yguo.net>, Björn Roy Baron
<bjorn3_gh@...tonmail.com>, "Benno Lossin" <lossin@...nel.org>, "Andreas
Hindborg" <a.hindborg@...nel.org>, "Alice Ryhl" <aliceryhl@...gle.com>,
"Trevor Gross" <tmgross@...ch.edu>, "David Airlie" <airlied@...il.com>,
"Simona Vetter" <simona@...ll.ch>, "Maarten Lankhorst"
<maarten.lankhorst@...ux.intel.com>, "Maxime Ripard" <mripard@...nel.org>,
"Thomas Zimmermann" <tzimmermann@...e.de>, "John Hubbard"
<jhubbard@...dia.com>, "Alistair Popple" <apopple@...dia.com>, "Joel
Fernandes" <joelagnelf@...dia.com>, "Timur Tabi" <ttabi@...dia.com>,
<rust-for-linux@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<nouveau@...ts.freedesktop.org>, <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare
the GSP firmware
On Sat Aug 30, 2025 at 9:56 PM JST, Danilo Krummrich wrote:
> On 8/29/25 1:16 PM, Alexandre Courbot wrote:
>> On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote:
>>> On 8/26/25 6:07 AM, Alexandre Courbot wrote:
>>>> /// Structure encapsulating the firmware blobs required for the GPU to operate.
>>>> #[expect(dead_code)]
>>>> pub(crate) struct Firmware {
>>>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware {
>>>> booter_unloader: BooterFirmware,
>>>> /// GSP bootloader, verifies the GSP firmware before loading and running it.
>>>> gsp_bootloader: RiscvFirmware,
>>>> - gsp: firmware::Firmware,
>>>> + /// GSP firmware.
>>>> + gsp: Pin<KBox<GspFirmware>>,
>>>
>>> Is there a reason why we don't just propagate it through struct Gpu, which uses
>>> pin-init already?
>>>
>>> You can make Firmware pin_data too and then everything is within the single
>>> allocation of struct Gpu.
>
> Thanks a lot for the write-up below!
>
>> I tried doing that at first, and hit the problem that the `impl PinInit`
>> returned by `GspFirmware::new` borrows a reference to the GSP firmware
>> binary loaded by `Firmware::new` - when `Firmware::new` returns, the
>> firmware gets freed, and the borrow checker complains.
>>
>> We could move the GSP firmware loading code into the `pin_init!` of
>> `Firmware::new`, but now we hit another problem: in `Gpu::new` the
>> following code is executed:
>>
>> FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)?
>>
>> which requires the `Firmware` instance, which doesn't exist yet as the
>> `Gpu` object isn't initialized until the end of the method.
>>
>> So we could move `FbLayout`, and everything else created by `Gpu::new`
>> to become members of the `Gpu` instance. It does make sense actually:
>> this `new` method is doing a lot of stuff, such as running FRTS, and
>> with Alistair's series it even runs Booter, the sequencer and so on.
>> Maybe we should move all firmware execution to a separate method that is
>> called by `probe` after the `Gpu` is constructed, as right now the `Gpu`
>> constructor looks like it does a bit more than it should.
>
> Absolutely, executing the firmware should be a separate method. Having it in the
> constructor makes things more difficult.
>> ... but even when doing that, `Firmware::new` and `FbLayout::new` still
>> require a reference to the `Bar`, and... you get the idea. :)
>
> Lifetime wise this should be fine, the &Bar out-lives the constructor, since
> it's lifetime is bound to the &Device<Bound> which lives for the entire duration
> of probe().
The &Bar is actually obtained inside the constructor (which is passed
the `Arc<Devres<Bar0>>`), so I don't think that would even work without
more clever tricks. :)
>> So I don't think the current design allows us to do that easily or at
>> all, and even if it does, it will be at a significant cost in code
>> clarity. There is also the fact that I am considering making the
>> firmware member of `Gpu` a trait object: the boot sequence is so
>> different between pre and post-Hopper that I don't think it makes sense
>> to share the same `Firmware` structure between the two. I would rather
>> see `Firmware` as an opaque trait object, which provides high-level
>> methods such as "start GSP" behind which the specifics of each GPU
>> family are hidden. If we go with this design, `Firmware` will become a
>> trait object and so cannot be pinned into `Gpu`.
>>
>> This doesn't change my observation that `Gpu::new` should not IMHO do
>> steps like booting the GSP - it should just acquire the resources it
>> needs, return the pinned GPU object, and then `probe` can continue the
>> boot sequence. Having the GPU object pinned and constructed early
>> simplifies things quite a bit as the more we progress with boot, the
>> harder it becomes to construct everything in place (and the `PinInit`
>> closure also becomes more and more complex).
>>
>> I'm still laying down the general design, but I'm pretty convinced that
>> having `Firmware` as a trait object is the right way to abstract the
>> differences between GPU families.
>
> Makes sense, it's fine with me to keep this in its separate allocation for the
> purpose of making Firmware an opaque trait object, which sounds reasonable.
>
> But we should really properly separate construction of the GPU structure from
> firmware boot code execution as you say. And actually move the construction of
> the GPU object into try_pin_init!().
Yes, and I'm glad you agree with this idea as the current design of
putting everything inside the GPU is a bit limiting.
Regarding the firmware, I also think this should undergo a redesign -
right now we are putting unrelated stuff inside the `Firmware`
structure, and this won't scale well when we start supporting Hopper+
which use a very different way to start the GSP.
I'll give more details in my review of Alistair's series, and hopefully
send a v3 with some of these ideas implemented soon.
Powered by blists - more mailing lists