[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DCQ074EMFNIK.1OJLWJXWZLDXZ@nvidia.com>
Date: Thu, 11 Sep 2025 22:26:08 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Danilo Krummrich" <dakr@...nel.org>, "Alexandre Courbot"
<acourbot@...dia.com>
Cc: "Miguel Ojeda" <ojeda@...nel.org>, "Alex Gaynor"
<alex.gaynor@...il.com>, "Boqun Feng" <boqun.feng@...il.com>, "Gary Guo"
<gary@...yguo.net>, Björn Roy Baron
<bjorn3_gh@...tonmail.com>, "Benno Lossin" <lossin@...nel.org>, "Andreas
Hindborg" <a.hindborg@...nel.org>, "Alice Ryhl" <aliceryhl@...gle.com>,
"Trevor Gross" <tmgross@...ch.edu>, "David Airlie" <airlied@...il.com>,
"Simona Vetter" <simona@...ll.ch>, "Maarten Lankhorst"
<maarten.lankhorst@...ux.intel.com>, "Maxime Ripard" <mripard@...nel.org>,
"Thomas Zimmermann" <tzimmermann@...e.de>, "John Hubbard"
<jhubbard@...dia.com>, "Alistair Popple" <apopple@...dia.com>, "Joel
Fernandes" <joelagnelf@...dia.com>, "Timur Tabi" <ttabi@...dia.com>,
<rust-for-linux@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<nouveau@...ts.freedesktop.org>, <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH v5 02/12] gpu: nova-core: move GSP boot code to a
dedicated method
On Thu Sep 11, 2025 at 9:46 PM JST, Danilo Krummrich wrote:
> On 9/11/25 2:17 PM, Alexandre Courbot wrote:
>> On Thu Sep 11, 2025 at 8:22 PM JST, Danilo Krummrich wrote:
>>> On 9/11/25 1:04 PM, Alexandre Courbot wrote:
>>>> + /// Attempt to start the GSP.
>>>> + ///
>>>> + /// This is a GPU-dependent and complex procedure that involves loading firmware files from
>>>> + /// user-space, patching them with signatures, and building firmware-specific intricate data
>>>> + /// structures that the GSP will use at runtime.
>>>> + ///
>>>> + /// Upon return, the GSP is up and running, and its runtime object given as return value.
>>>> + pub(crate) fn start_gsp(
>>>> + pdev: &pci::Device<device::Bound>,
>>>> + bar: &Bar0,
>>>> + chipset: Chipset,
>>>> + gsp_falcon: &Falcon<Gsp>,
>>>> + _sec2_falcon: &Falcon<Sec2>,
>>>> + ) -> Result<()> {> + let dev = pdev.as_ref();
>>>> +
>>>> + let bios = Vbios::new(dev, bar)?;
>>>> +
>>>> + let fb_layout = FbLayout::new(chipset, bar)?;
>>>> + dev_dbg!(dev, "{:#x?}\n", fb_layout);
>>>> +
>>>> + Self::run_fwsec_frts(dev, gsp_falcon, bar, &bios, &fb_layout)?;
>>>> +
>>>> + // Return an empty placeholder for now, to be replaced with the GSP runtime data.
>>>> + Ok(())
>>>> + }
>>>
>>> I'd rather create the Gsp structure already, move the code to Gsp::new() and
>>> return an impl PinInit<Self, Error>. If you don't want to store any of the
>>> object instances you create above yet, you can just stuff all the code into an
>>> initializer code block, as you do in the next patch with
>>> gfw::wait_gfw_boot_completion().
>>
>> I don't think that would work, or be any better even if it did. The full
>> GSP initialization is pretty complex and all we need to return is one
>> object created at the beginning that doesn't need to be pinned.
>> Moreover, the process is also dependent on the GPU family and completely
>> different on Hopper/Blackwell.
>
> Why would it not work? There is no difference between the code above being
> executed from an initializer block or directly in Gsp::new().
Yeah, that's pretty much my point. :) Why run it in an initializer if
the result doesn't need to be initialized in-place anyway?
>> You can see the whole process on [1]. `libos` is the object that is
>> returned (although its name and type will change). All the rest it
>> loading, preparing and running firmware, and that is done on the GPU. I
>> think it would be very out of place in the GSP module.
>>
>> It is also very step-by-step: run this firmware, wait for it to
>> complete, run another one, wait for a specific message from the GSP, run
>> the sequencer, etc. And most of this stuff is thrown away once the GSP
>> is running. That's where the limits of what we can do with `pin_init!`
>> are reached, and the GSP object doesn't need to be pinned anyway.
>
> I don't see that, in the code you linked you have a bunch of calls that don't
> return anything that needs to survive, this can be in an initializer block.
>
> And then you have
>
> let mut libos = gsp::GspMemObjects::new(pdev, bar)?;
>
> which only needs the device reference and the bar reference.
>
> So you can easily write this as:
>
> try_pin_init!(Self {
> _: {
> // all the throw-away stuff from above
> },
> libos <- gsp::GspMemObjects::new(pdev, bar),
> _: {
> libos.do_some_stuff_mutable()?;
> }
> })
Can the second initializer block access variables created in the first?
I suspect we can also initialize `libos` first, and move everything in a
block, but then my question would be why do we need to jump through that
hoop.
>> By keeping the initialization in the GPU, we can keep the GSP object
>> architecture-independent, and I think it makes sense from a design point
>> of view. That's not to say this code should be in `gpu.rs`, maybe we
>> want to move it to a GPU HAL, or if we really want this as part of the
>> GSP a `gsp/boot` module supporting all the different archs. But I'd
>> prefer to think about this when we start supporting several
>> architectures.
>
> Didn't we talk about a struct Gsp that will eventually be returned by
> Self::start_gsp(), or did I make this up in my head?
>
> The way I think about this is that we'll have a struct Gsp that represents the
> entry point in the driver to mess with the GSP command queue.
>
> But either way, this throws up two questions, if Self::start_gsp() return a
> struct GspMemObjects instead (which is probably the same thing with a different
> name), then:
>
> Are we sure this won't need any locks? If it will need locking (which I expect)
> then it needs pin-init.
Sorry, I have been imprecise: I should I said: "it can be moved" rather
than "it doesn't need to be pinned". In that case I don't think
`Gsp::new` needs to return an `impl PinInit`, right?
>
> If it never needs pinning why did you write it as
>
> gsp <- Self::start_gsp(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)?,
>
> in a patch 3?
>> [1] https://github.com/Gnurou/linux/blob/gsp_init_rebase/drivers/gpu/nova-core/gpu.rs#L305
Ah, I blindly copied that part from your initial suggestion [1] and
forgot to double check that part. We can use `:` here for `gsp`, as the
returned value of `start_gsp` can be moved without any issue. So if we
put it behind a lock at the `Gpu` level, the current pattern should not
be a problem as it can be moved where needed by the `Gpu` initializer.
Now I don't have a precise idea of how we are going to do locking, and
you seem to have given it more thought than I have, so please let me
know if I am still missing something.
[1] https://lore.kernel.org/rust-for-linux/DCOCL398HXDH.3QH9U6UGGIUP1@kernel.org/
Powered by blists - more mailing lists