lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <iyjecyybwyilem2ituw6esmufid72cximthc5qo2fgdpzz4fko@cb6n2vcrptb5>
Date: Wed, 3 Sep 2025 18:57:28 +1000
From: Alistair Popple <apopple@...dia.com>
To: Alexandre Courbot <acourbot@...dia.com>
Cc: dri-devel@...ts.freedesktop.org, dakr@...nel.org, 
	Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>, 
	Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, 
	Björn Roy Baron <bjorn3_gh@...tonmail.com>, Benno Lossin <lossin@...nel.org>, 
	Andreas Hindborg <a.hindborg@...nel.org>, Alice Ryhl <aliceryhl@...gle.com>, 
	Trevor Gross <tmgross@...ch.edu>, David Airlie <airlied@...il.com>, 
	Simona Vetter <simona@...ll.ch>, Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, 
	Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, 
	John Hubbard <jhubbard@...dia.com>, Joel Fernandes <joelagnelf@...dia.com>, 
	Timur Tabi <ttabi@...dia.com>, linux-kernel@...r.kernel.org, nouveau@...ts.freedesktop.org, 
	Nouveau <nouveau-bounces@...ts.freedesktop.org>
Subject: Re: [PATCH 03/10] gpu: nova-core: gsp: Create wpr metadata

On 2025-09-01 at 17:46 +1000, Alexandre Courbot <acourbot@...dia.com> wrote...
> Hi Alistair,
> 
> On Wed Aug 27, 2025 at 5:20 PM JST, Alistair Popple wrote:
> <snip>
> > index 161c057350622..1f51e354b9569 100644
> > --- a/drivers/gpu/nova-core/gsp.rs
> > +++ b/drivers/gpu/nova-core/gsp.rs
> > @@ -6,12 +6,17 @@
> >  use kernel::dma_write;
> >  use kernel::pci;
> >  use kernel::prelude::*;
> > -use kernel::ptr::Alignment;
> > +use kernel::ptr::{Alignable, Alignment};
> > +use kernel::sizes::SZ_128K;
> >  use kernel::transmute::{AsBytes, FromBytes};
> >  
> > +use crate::fb::FbLayout;
> > +use crate::firmware::Firmware;
> >  use crate::nvfw::{
> > -    LibosMemoryRegionInitArgument, LibosMemoryRegionKind_LIBOS_MEMORY_REGION_CONTIGUOUS,
> > -    LibosMemoryRegionLoc_LIBOS_MEMORY_REGION_LOC_SYSMEM,
> > +    GspFwWprMeta, GspFwWprMetaBootInfo, GspFwWprMetaBootResumeInfo, LibosMemoryRegionInitArgument,
> > +    LibosMemoryRegionKind_LIBOS_MEMORY_REGION_CONTIGUOUS,
> > +    LibosMemoryRegionLoc_LIBOS_MEMORY_REGION_LOC_SYSMEM, GSP_FW_WPR_META_MAGIC,
> > +    GSP_FW_WPR_META_REVISION,
> >  };
> >  
> >  pub(crate) const GSP_PAGE_SHIFT: usize = 12;
> > @@ -25,12 +30,69 @@ unsafe impl AsBytes for LibosMemoryRegionInitArgument {}
> >  // are valid.
> >  unsafe impl FromBytes for LibosMemoryRegionInitArgument {}
> >  
> > +// SAFETY: Padding is explicit and will not contain uninitialized data.
> > +unsafe impl AsBytes for GspFwWprMeta {}
> > +
> > +// SAFETY: This struct only contains integer types for which all bit patterns
> > +// are valid.
> > +unsafe impl FromBytes for GspFwWprMeta {}
> > +
> >  #[allow(unused)]
> >  pub(crate) struct GspMemObjects {
> >      libos: CoherentAllocation<LibosMemoryRegionInitArgument>,
> >      pub loginit: CoherentAllocation<u8>,
> >      pub logintr: CoherentAllocation<u8>,
> >      pub logrm: CoherentAllocation<u8>,
> > +    pub wpr_meta: CoherentAllocation<GspFwWprMeta>,
> > +}
> 
> I think `wpr_meta` is a bit out-of-place in this structure. There are
> several reason for this:
> 
> - All the other members of this structure (including `cmdq` which is
>   added later) are referenced by `libos` and constitute the GSP runtime:
>   they are used as long as the GSP is active. `wpr_meta`, OTOH, does not
>   reference any of the other objects, nor is it referenced by them.
> - `wpr_meta` is never used by the GSP, but needed as a parameter of
>   Booter on SEC2 to load the GSP firmware. It can actually be discarded
>   once this step is completed. This is very different from the rest of
>   this structure, which is used by the GSP.

Yes, I had noticed that too and had tried to remove it previously. But as you
mention below that was a little bit tricky but if you fix it for v3 I think this
all makes perfect sense.

> So I think it doesn't really belong here, and would probably fit better
> in `Firmware`. Now the fault lies in my own series, which doesn't let
> you build `wpr_meta` easily from there. I'll try to fix that in the v3.
>
> And with the removal of `wpr_meta`, this structure ends up strictly
> containing the GSP runtime, including the command queue... Maybe it can
> simply be named `Gsp` then? It is even already in the right module! :)

Agreed - I noticed this right after I renamed this struct last time so wanted
to let things settle down a bit before doing another rename. But I think `Gsp`
makes a whole lot more sense, especially if we remove the wpr_meta data.

> Loosely related, but looking at this series made me realize there is a
> very logical split of our firmware into two "bundles":
> 
> - The GSP bundle includes the GSP runtime data, which is this
>   `GspMemObjects` structure minus `wpr_meta`. We pass it as an input
>   parameter to the GSP firmware using the GSP's falcon mbox registers.
>   It must live as long as the GSP is running.
> - The SEC2 bundle includes Booter, `wpr_meta`, the GSP firmware binary,
>   bootloader and its signatures (which are all referenced by
>   `wpr_meta`). All this data is consumed by SEC2, and crucially can be
>   dropped once the GSP is booted.
> 
> This separation is important as currently we are stuffing anything
> firmware-related into the `Firmware` struct and keep it there forever,
> consuming dozens of megabytes of host memory that we could free. Booting
> the GSP is typically a one-time operation in the life of the GPU device,
> and even if we ever need to do it again, we can very well build the SEC2
> bundle from scratch again.
> 
> I will try to reflect the separation better in the v3 of my patchset -
> then we can just build `wpr_meta` as a local variable of the method that
> runs `Booter`, and drop it (alongside the rest of the SEC2 bundle) upon
> return.
> 
> > +
> > +pub(crate) fn build_wpr_meta(
> > +    dev: &device::Device<device::Bound>,
> > +    fw: &Firmware,
> > +    fb_layout: &FbLayout,
> > +) -> Result<CoherentAllocation<GspFwWprMeta>> {
> > +    let wpr_meta =
> > +        CoherentAllocation::<GspFwWprMeta>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
> > +    dma_write!(
> > +        wpr_meta[0] = GspFwWprMeta {
> > +            magic: GSP_FW_WPR_META_MAGIC as u64,
> > +            revision: u64::from(GSP_FW_WPR_META_REVISION),
> > +            sysmemAddrOfRadix3Elf: fw.gsp.lvl0_dma_handle(),
> > +            sizeOfRadix3Elf: fw.gsp.size as u64,
> > +            sysmemAddrOfBootloader: fw.gsp_bootloader.ucode.dma_handle(),
> > +            sizeOfBootloader: fw.gsp_bootloader.ucode.size() as u64,
> > +            bootloaderCodeOffset: u64::from(fw.gsp_bootloader.code_offset),
> > +            bootloaderDataOffset: u64::from(fw.gsp_bootloader.data_offset),
> > +            bootloaderManifestOffset: u64::from(fw.gsp_bootloader.manifest_offset),
> > +            __bindgen_anon_1: GspFwWprMetaBootResumeInfo {
> > +                __bindgen_anon_1: GspFwWprMetaBootInfo {
> > +                    sysmemAddrOfSignature: fw.gsp_sigs.dma_handle(),
> > +                    sizeOfSignature: fw.gsp_sigs.size() as u64,
> > +                }
> > +            },
> > +            gspFwRsvdStart: fb_layout.heap.start,
> > +            nonWprHeapOffset: fb_layout.heap.start,
> > +            nonWprHeapSize: fb_layout.heap.end - fb_layout.heap.start,
> > +            gspFwWprStart: fb_layout.wpr2.start,
> > +            gspFwHeapOffset: fb_layout.wpr2_heap.start,
> > +            gspFwHeapSize: fb_layout.wpr2_heap.end - fb_layout.wpr2_heap.start,
> > +            gspFwOffset: fb_layout.elf.start,
> > +            bootBinOffset: fb_layout.boot.start,
> > +            frtsOffset: fb_layout.frts.start,
> > +            frtsSize: fb_layout.frts.end - fb_layout.frts.start,
> > +            gspFwWprEnd: fb_layout
> > +                .vga_workspace
> > +                .start
> > +                .align_down(Alignment::new(SZ_128K)),
> > +            gspFwHeapVfPartitionCount: fb_layout.vf_partition_count,
> > +            fbSize: fb_layout.fb.end - fb_layout.fb.start,
> > +            vgaWorkspaceOffset: fb_layout.vga_workspace.start,
> > +            vgaWorkspaceSize: fb_layout.vga_workspace.end - fb_layout.vga_workspace.start,
> > +            ..Default::default()
> > +        }
> > +    )?;
> > +
> > +    Ok(wpr_meta)
> 
> I've discussed the bindings abstractions with Danilo last week. We
> agreed that no layout information should ever escape the `nvfw` module.
> I.e. the fields of `GspFwWprMeta` should not even be visible here.
> 
> Instead, `GspFwWprMeta` should be wrapped privately into another
> structure inside `nvfw`:
> 
>   /// Structure passed to the GSP bootloader, containing the framebuffer layout as well as the DMA
>   /// addresses of the GSP bootloader and firmware.
>   #[repr(transparent)]
>   pub(crate) struct GspFwWprMeta(r570_144::GspFwWprMeta);

I'm a little bit unsure what the advantage of this is? Admittedly I'm not sure
I've seen the discussion from last week so I may have missed something but it's
not obvious how creating another layer of abstraction is better. How would it
help contain any layout changes to nvfw? Supporting any new struct fields for
example would almost certainly still require code changes outside nvfw.

My thinking here was that the bindings (at least for GSP) probably want to live
in the Gsp crate/module, and the rest of the driver would be isolated from Gsp
changes by the public API provided by the Gsp crate/module rather than trying to
do that at the binding level. For example the get_gsp_info() command implemented
in [1] provides a separate public struct representing what the rest of the
driver needs, thus ensuring the implementation specific details of Gsp (such as
struct layout) do not leak into the wider nova-core driver.

> All its implementations should also be there:
> 
>   // SAFETY: Padding is explicit and will not contain uninitialized data.
>   unsafe impl AsBytes for GspFwWprMeta {}
> 
>   // SAFETY: This struct only contains integer types for which all bit patterns
>   // are valid.
>   unsafe impl FromBytes for GspFwWprMeta {}

Makes sense.

> And lastly, this `new` method can also be moved into `nvfw`, as an impl
> block for the wrapping `GspFwWprMeta` type. That way no layout detail
> escapes that module, and it will be easier to adapt the code to
> potential layout chances with new firmware versions.
> 
> (note that my series is the one carelessly re-exporting `GspFwWprMeta`
> as-is - I'll fix that too in v3)
> 
> The same applies to `LibosMemoryRegionInitArgument` of the previous
> patch, and other types introduced in subsequent patches. Usually there
> is little more work to do than moving the implentations into `nvfw` as
> everything is already abstracted correctly - just not where we
> eventually want it.

This is where I get a little bit uncomfortable - this doesn't feel right to me.
It seems to me moving all these implementations to the bindings would just end
up with a significant amount of Gsp code in nvfw.rs rather than in the places
that actually use it, making nvfw.rs large and unwieldy and the code more
distributed and harder to follow.

And it's all tightly coupled anyway - for example the Gsp boot arguments require some
command queue offsets which are all pretty specific to the Gsp implementation.
Ie. we can't define some nice public API in the Gsp crate for "getting arguments
required for booting Gsp" without that just being "here is a struct containing
all the fields that must be packed into the Gsp arguments for this version",
which at that point may as well just be the actual struct itself right?

 - Alistair

[1] - https://lore.kernel.org/rust-for-linux/20250829173254.2068763-18-joelagnelf@nvidia.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ