linux-kernel - Re: [PATCH v5 02/12] gpu: nova-core: move GSP boot code to a dedicated method

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce74db34-77bc-4207-94c8-6e0580189448@kernel.org>
Date: Thu, 11 Sep 2025 14:46:51 +0200
From: Danilo Krummrich <dakr@...nel.org>
To: Alexandre Courbot <acourbot@...dia.com>
Cc: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
 Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
 Björn Roy Baron <bjorn3_gh@...tonmail.com>,
 Benno Lossin <lossin@...nel.org>, Andreas Hindborg <a.hindborg@...nel.org>,
 Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 John Hubbard <jhubbard@...dia.com>, Alistair Popple <apopple@...dia.com>,
 Joel Fernandes <joelagnelf@...dia.com>, Timur Tabi <ttabi@...dia.com>,
 rust-for-linux@...r.kernel.org, linux-kernel@...r.kernel.org,
 nouveau@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH v5 02/12] gpu: nova-core: move GSP boot code to a
 dedicated method

On 9/11/25 2:17 PM, Alexandre Courbot wrote:
> On Thu Sep 11, 2025 at 8:22 PM JST, Danilo Krummrich wrote:
>> On 9/11/25 1:04 PM, Alexandre Courbot wrote:
>>> +    /// Attempt to start the GSP.
>>> +    ///
>>> +    /// This is a GPU-dependent and complex procedure that involves loading firmware files from
>>> +    /// user-space, patching them with signatures, and building firmware-specific intricate data
>>> +    /// structures that the GSP will use at runtime.
>>> +    ///
>>> +    /// Upon return, the GSP is up and running, and its runtime object given as return value.
>>> +    pub(crate) fn start_gsp(
>>> +        pdev: &pci::Device<device::Bound>,
>>> +        bar: &Bar0,
>>> +        chipset: Chipset,
>>> +        gsp_falcon: &Falcon<Gsp>,
>>> +        _sec2_falcon: &Falcon<Sec2>,
>>> +    ) -> Result<()> {> +        let dev = pdev.as_ref();
>>> +
>>> +        let bios = Vbios::new(dev, bar)?;
>>> +
>>> +        let fb_layout = FbLayout::new(chipset, bar)?;
>>> +        dev_dbg!(dev, "{:#x?}\n", fb_layout);
>>> +
>>> +        Self::run_fwsec_frts(dev, gsp_falcon, bar, &bios, &fb_layout)?;
>>> +
>>> +        // Return an empty placeholder for now, to be replaced with the GSP runtime data.
>>> +        Ok(())
>>> +    }
>>
>> I'd rather create the Gsp structure already, move the code to Gsp::new() and
>> return an impl PinInit<Self, Error>. If you don't want to store any of the
>> object instances you create above yet, you can just stuff all the code into an
>> initializer code block, as you do in the next patch with
>> gfw::wait_gfw_boot_completion().
> 
> I don't think that would work, or be any better even if it did. The full
> GSP initialization is pretty complex and all we need to return is one
> object created at the beginning that doesn't need to be pinned.
> Moreover, the process is also dependent on the GPU family and completely
> different on Hopper/Blackwell.

Why would it not work? There is no difference between the code above being
executed from an initializer block or directly in Gsp::new().
> You can see the whole process on [1]. `libos` is the object that is
> returned (although its name and type will change). All the rest it
> loading, preparing and running firmware, and that is done on the GPU. I
> think it would be very out of place in the GSP module.
> 
> It is also very step-by-step: run this firmware, wait for it to
> complete, run another one, wait for a specific message from the GSP, run
> the sequencer, etc. And most of this stuff is thrown away once the GSP
> is running. That's where the limits of what we can do with `pin_init!`
> are reached, and the GSP object doesn't need to be pinned anyway.

I don't see that, in the code you linked you have a bunch of calls that don't
return anything that needs to survive, this can be in an initializer block.

And then you have

let mut libos = gsp::GspMemObjects::new(pdev, bar)?;

which only needs the device reference and the bar reference.

So you can easily write this as:

try_pin_init!(Self {
   _: {
      // all the throw-away stuff from above
   },
   libos <- gsp::GspMemObjects::new(pdev, bar),
   _: {
      libos.do_some_stuff_mutable()?;
   }
})
> By keeping the initialization in the GPU, we can keep the GSP object
> architecture-independent, and I think it makes sense from a design point
> of view. That's not to say this code should be in `gpu.rs`, maybe we
> want to move it to a GPU HAL, or if we really want this as part of the
> GSP a `gsp/boot` module supporting all the different archs. But I'd
> prefer to think about this when we start supporting several
> architectures.

Didn't we talk about a struct Gsp that will eventually be returned by
Self::start_gsp(), or did I make this up in my head?

The way I think about this is that we'll have a struct Gsp that represents the
entry point in the driver to mess with the GSP command queue.

But either way, this throws up two questions, if Self::start_gsp() return a
struct GspMemObjects instead (which is probably the same thing with a different
name), then:

Are we sure this won't need any locks? If it will need locking (which I expect)
then it needs pin-init.

If it never needs pinning why did you write it as

gsp <- Self::start_gsp(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)?,

in a patch 3?
> [1] https://github.com/Gnurou/linux/blob/gsp_init_rebase/drivers/gpu/nova-core/gpu.rs#L305