lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250619-nova-frts-v6-0-ecf41ef99252@nvidia.com>
Date: Thu, 19 Jun 2025 22:23:44 +0900
From: Alexandre Courbot <acourbot@...dia.com>
To: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>, 
 Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, 
 Björn Roy Baron <bjorn3_gh@...tonmail.com>, 
 Andreas Hindborg <a.hindborg@...nel.org>, Alice Ryhl <aliceryhl@...gle.com>, 
 Trevor Gross <tmgross@...ch.edu>, Danilo Krummrich <dakr@...nel.org>, 
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>, 
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, 
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, 
 Benno Lossin <lossin@...nel.org>
Cc: John Hubbard <jhubbard@...dia.com>, Ben Skeggs <bskeggs@...dia.com>, 
 Joel Fernandes <joelagnelf@...dia.com>, Timur Tabi <ttabi@...dia.com>, 
 Alistair Popple <apopple@...dia.com>, linux-kernel@...r.kernel.org, 
 rust-for-linux@...r.kernel.org, nouveau@...ts.freedesktop.org, 
 dri-devel@...ts.freedesktop.org, Alexandre Courbot <acourbot@...dia.com>, 
 Benno Lossin <lossin@...nel.org>, Lyude Paul <lyude@...hat.com>, 
 Shirish Baskaran <sbaskaran@...dia.com>
Subject: [PATCH v6 00/24] nova-core: run FWSEC-FRTS to perform first stage
 of GSP initialization

Hi everyone,

After discussion, and since the `num` module seems to be taking more
time to reach consensus than the rest of this series, I have split it
into its own patch series and use ad-hoc code in Nova (only a handful of
places thankfully) for now that will be replaced by the `num` patch
series. This will also allow it to maybe get more attention as it was
until now buried inside a loosely-related patch series.

This also includes an important fix for a bug discovered by Ben Skeggs
in the falcon code: the bit indicating the completion of memory
scrubbing was interpreted incorrectly, which created a race condition
that could result in a failure to boot the GSP. :O

Other than that, a few more minor refinements took place, but nothing
that changes this series considerably. The last patch tries to organize
the increasing number of TODO items we have in the code; until they can
be addressed, it would be nice to understand which task in `todo.rst`
they correspond to, so I took the freedom to annotate them all to that
effect.

Usual disclaimer: this series currently only successfully probes Ampere
GPUs, and does not allow the GPU to do anything useful yet. Upon
successful probe, the driver will only display the range of the WPR2
region constructed by FWSEC-FRTS with debug priority:

  [   95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000
  [   95.436002] NovaCore 0000:01:00.0: GPU instance built

This series is based on v6.16-rc1 with no other dependencies.

There are bits of documentation still missing, these are addressed by
Joel in his own documentation patch series [1]. I'll also double-check
and send follow-up patches if anything is still missing after that.

[1] https://lore.kernel.org/rust-for-linux/20250503040802.1411285-1-joelagnelf@nvidia.com/

Signed-off-by: Alexandre Courbot <acourbot@...dia.com>
---
Changes in v6:
- Add `dma_handle_with_offset` method to CoherentAllocation.
- Move the `num` module into its own patchset and use ad-hoc code for
  now.
- Add new items (and remove obsolete ones) to the TODO tag `TODO`
  entries in the code with their corresponding task in the list.
- Add `TIMEOUT:` comments wherever a timeout is used.
- Fix bug while waiting for falcon mem scrubbing to finish (thanks Ben
  Skeggs!)
- Pass the firwmare object instead of its DMA handle in `dma_wr`.
- Fix safety statements in `fwsec.rs`.
- Move FWSEC boot code to `FwsecFirmware` and a helper function of
  `Gpu` to simplify `Gpu::new`.
- Add helper methods to NV_PFB_PRI_MMU_WPR2_ADDR_* to obtain the exact
  address.
- Fix build errors and warnings with Rust 1.78.
- Link to v5: https://lore.kernel.org/r/20250612-nova-frts-v5-0-14ba7eaf166b@nvidia.com

Changes in v5:
- Rebased on top of 6.16-rc1.
- Improve invariants of CoherentAllocation related to the new `size`
  method.
- Use SZ_* consts when redefining BAR0 size.
- Split VBIOS patch into 3 patches (Joel)
- Convert all `Result<()>` into `Result`.
- Use `::cast<T>()` instead of ` as ` to convert pointer types.
- Use `KBox` instead of `Arc` for falcon HALs.
- Do not use `get_` prefix on methods that do not increase reference
  count.
- Replace arbitrary immediate values with proper constants.
- Use EIO to indicate firmware errors.
- Use inspect_err to be more verbose on which step of the FWSEC setup
  failed.
- Move sysmem flush page into its own type and add its registration to
  the FB HAL.
- Turn HAL getters into standalone functions.
- Patch FWSEC command at construction time.
- Force the signing stage (or an explicit non-signing state transition)
  on the firmware DMA objects.
- Link to v4: https://lore.kernel.org/r/20250521-nova-frts-v4-0-05dfd4f39479@nvidia.com

Changes in v4:
- Improve documentation of falcon security modes (thanks Joel!)
- Add the definition of the size of CoherentAllocation as one of its
  invariants.
- Better document GFW boot progress, registers and use wait_on() helper,
  and move it to `gfw` module instead of `devinit`.
- Add missing TODOs for workarounds waiting to be replaced by in-flight
  R4L features.
- Register macro: add the offset of the register as a type constant, and
  allow register aliases for registers which can be interpreted
  differently depending on context.
- Rework the `num` module using only macros (to allow use of overflowing
  ops), and add the `PowerOfTwo` type.
- Add a proper HAL to the `fb` module.
- Move HAL builders to impl blocks of Chipset.
- Add proper types and traits for signatures.
- Proactively split FalconFirmware into distinct traits to ease
  management of v2 vs v3 FWSEC headers that will be needed for Turing
  support.
- Link to v3:
  https://lore.kernel.org/r/20250507-nova-frts-v3-0-fcb02749754d@nvidia.com

Changes in v3:
- Rebased on top of latest nova-next.
- Use the new Devres::access() and remove the now unneeded with_bar!()
  macro.
- Dropped `rust: devres: allow to borrow a reference to the resource's
  Device` as it is not needed anymore.
- Fixed more erroneous uses of `ERANGE` error.
- Optimized alignment computations of the FB layout a bit.
- Link to v2: https://lore.kernel.org/r/20250501-nova-frts-v2-0-b4a137175337@nvidia.com

Changes in v2:
- Rebased on latest nova-next.
- Fixed all clippy warnings.
- Added `count` and `size` methods to `CoherentAllocation`.
- Added method to obtain a reference to the `Device` from a `Devres`
  (this is super convenient).
- Split `DmaObject` into its own patch and added `Deref` implementation.
- Squashed field names from [3] into "extract FWSEC from BIOS".
- Fixed erroneous use of `ERANGE` error.
- Reworked `register!()` macro towards a more intuitive syntax, moved
  its helper macros into internal rules to avoid polluting the macro
  namespace.
- Renamed all registers to capital snake case to better match OpenRM.
- Removed declarations for registers that are not used yet.
- Added more documentation for items not covered by Joel's documentation
  patches.
- Removed timer device and replaced it with a helper function using
  `Ktime`. This also made [4] unneeded so it is dropped.
- Unregister the sysmem flush page upon device destruction.
- ... probably more that I forgot. >_<
- Link to v1: https://lore.kernel.org/r/20250420-nova-frts-v1-0-ecd1cca23963@nvidia.com

[3] https://lore.kernel.org/all/20250423225405.139613-6-joelagnelf@nvidia.com/
[4] https://lore.kernel.org/lkml/20250420-nova-frts-v1-1-ecd1cca23963@nvidia.com/

---
Alexandre Courbot (21):
      rust: dma: fix comment
      rust: dma: expose the count and size of CoherentAllocation
      rust: dma: add dma_handle_with_offset method to CoherentAllocation
      rust: make ETIMEDOUT error available
      rust: sizes: add constants up to SZ_2G
      gpu: nova-core: use absolute paths in register!() macro
      gpu: nova-core: add delimiter for helper rules in register!() macro
      gpu: nova-core: expose the offset of each register as a type constant
      gpu: nova-core: allow register aliases
      gpu: nova-core: increase BAR0 size to 16MB
      gpu: nova-core: add helper function to wait on condition
      gpu: nova-core: wait for GFW_BOOT completion
      gpu: nova-core: add DMA object struct
      gpu: nova-core: register sysmem flush page
      gpu: nova-core: add falcon register definitions and base code
      gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS
      gpu: nova-core: compute layout of the FRTS region
      gpu: nova-core: add types for patching firmware binaries
      gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS
      gpu: nova-core: load and run FWSEC-FRTS
      gpu: nova-core: update and annotate TODO list

Joel Fernandes (3):
      gpu: nova-core: vbios: Add base support for VBIOS construction and iteration
      gpu: nova-core: vbios: Add support to look up PMU table in FWSEC
      gpu: nova-core: vbios: Add support for FWSEC ucode extraction

 Documentation/gpu/nova/core/todo.rst      |  107 +--
 drivers/gpu/nova-core/dma.rs              |   58 ++
 drivers/gpu/nova-core/driver.rs           |    6 +-
 drivers/gpu/nova-core/falcon.rs           |  554 ++++++++++++++
 drivers/gpu/nova-core/falcon/gsp.rs       |   24 +
 drivers/gpu/nova-core/falcon/hal.rs       |   54 ++
 drivers/gpu/nova-core/falcon/hal/ga102.rs |  119 +++
 drivers/gpu/nova-core/falcon/sec2.rs      |   10 +
 drivers/gpu/nova-core/fb.rs               |  136 ++++
 drivers/gpu/nova-core/fb/hal.rs           |   39 +
 drivers/gpu/nova-core/fb/hal/ga100.rs     |   57 ++
 drivers/gpu/nova-core/fb/hal/ga102.rs     |   36 +
 drivers/gpu/nova-core/fb/hal/tu102.rs     |   58 ++
 drivers/gpu/nova-core/firmware.rs         |  108 +++
 drivers/gpu/nova-core/firmware/fwsec.rs   |  423 +++++++++++
 drivers/gpu/nova-core/gfw.rs              |   41 +
 drivers/gpu/nova-core/gpu.rs              |  132 +++-
 drivers/gpu/nova-core/nova_core.rs        |    5 +
 drivers/gpu/nova-core/regs.rs             |  288 +++++++
 drivers/gpu/nova-core/regs/macros.rs      |   65 +-
 drivers/gpu/nova-core/util.rs             |   28 +
 drivers/gpu/nova-core/vbios.rs            | 1157 +++++++++++++++++++++++++++++
 rust/kernel/dma.rs                        |   48 +-
 rust/kernel/error.rs                      |    1 +
 rust/kernel/sizes.rs                      |   24 +
 25 files changed, 3504 insertions(+), 74 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250417-nova-frts-96ef299abe2c

Best regards,
-- 
Alexandre Courbot <acourbot@...dia.com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ