lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251216-nova-unload-v1-0-6a5d823be19d@nvidia.com>
Date: Tue, 16 Dec 2025 14:13:26 +0900
From: Alexandre Courbot <acourbot@...dia.com>
To: Danilo Krummrich <dakr@...nel.org>, Alice Ryhl <aliceryhl@...gle.com>, 
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>, 
 Bjorn Helgaas <bhelgaas@...gle.com>, 
 Krzysztof Wilczyński <kwilczynski@...nel.org>, 
 Miguel Ojeda <ojeda@...nel.org>, Boqun Feng <boqun.feng@...il.com>, 
 Gary Guo <gary@...yguo.net>, 
 Björn Roy Baron <bjorn3_gh@...tonmail.com>, 
 Benno Lossin <lossin@...nel.org>, Andreas Hindborg <a.hindborg@...nel.org>, 
 Trevor Gross <tmgross@...ch.edu>
Cc: John Hubbard <jhubbard@...dia.com>, 
 Alistair Popple <apopple@...dia.com>, 
 Joel Fernandes <joelagnelf@...dia.com>, Timur Tabi <ttabi@...dia.com>, 
 Edwin Peer <epeer@...dia.com>, Eliot Courtney <ecourtney@...dia.com>, 
 nouveau@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org, 
 linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org, 
 rust-for-linux@...r.kernel.org, Alexandre Courbot <acourbot@...dia.com>
Subject: [PATCH 0/7] gpu: nova-core: run unload sequence upon unbinding

Currently the GSP is left running and the WPR2 memory region untouched
as the driver is unbound. This is obviously not idea for at least two
reasons:

- Probing requires setting up the WPR2 region, which cannot be done if
  there is already one in place. Thus the current requirement to reset
  the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before
  the driver can be probed again after removal.
- The running GSP may still attempt to access shared memory regions,
  which the kernel might recycle.

This patchset does the necessary to leave the GPU in a clean state after
unbind.

First are a few preparatory patches:

- Running the unload sequence requires mutable access to the driver
  data, but the current device unbind method only passes a non-mutable
  reference to it. Since the driver data is destroyed after the call to
  `unbind`, we can just give ownership back to the driver at this stage
  to solve this issue.
  The need for mutable access is likely to go away in Nova after we
  support concurrency on the command queue, but for now we need it and
  it looks like a sensible design direction anyway.
- A `warn_on_err` macro is introduced to call `warn_on` if the passed
  `Result` is an error. This simplifies the unbind sequence's code as we
  need to proceed to the next step even if the previous one failed.
- A fix (?) to the automatically-generated pin-projected structures,
  suppressing the warnings when using them partially.

With these in place, the rest of the patchset is relatively trivial. We
change the signatures of methods related to unbinding to work with
mutable pinned driver data, then implement the two steps of the GPU
unbind sequence: asking the GSP to shut down, and removing the WPR2
protected memory area.

This series sits on top of the following:

- Nova fixes for this cycle [1].
- Nova misc improvements [2].
- Transmute on ZSTs [3].

A tree with all the required patches is available in [4].

[1] https://lore.kernel.org/all/20251216-nova-fixes-v3-0-c7469a71f7c4@nvidia.com/
[2] https://lore.kernel.org/all/20251216-nova-misc-v2-0-dc7b42586c04@nvidia.com/
[3] https://lore.kernel.org/all/20251215-transmute_unit-v4-0-477d71ec7c23@nvidia.com/
[4] https://github.com/Gnurou/linux/tree/b4/nova-unload

Signed-off-by: Alexandre Courbot <acourbot@...dia.com>
---
Alexandre Courbot (7):
      rust: pci: pass driver data by value to `unbind`
      rust: add warn_on_err macro
      gpu: nova-core: use warn_on_err macro
      [RFC] rust: pin-init: allow `dead_code` on projection structure
      gpu: nova-nova: use pin-init projections
      gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command GSP upon unloading
      gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding

 drivers/gpu/nova-core/driver.rs                   |  4 +-
 drivers/gpu/nova-core/firmware/booter.rs          |  1 -
 drivers/gpu/nova-core/firmware/fwsec.rs           |  1 -
 drivers/gpu/nova-core/gpu.rs                      | 25 ++++++--
 drivers/gpu/nova-core/gsp/boot.rs                 | 77 +++++++++++++++++++++++
 drivers/gpu/nova-core/gsp/commands.rs             | 42 +++++++++++++
 drivers/gpu/nova-core/gsp/fw.rs                   |  4 ++
 drivers/gpu/nova-core/gsp/fw/commands.rs          | 27 ++++++++
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs |  8 +++
 drivers/gpu/nova-core/regs.rs                     |  5 ++
 rust/kernel/bug.rs                                | 10 +++
 rust/kernel/pci.rs                                |  4 +-
 rust/pin-init/src/macros.rs                       |  1 +
 samples/rust/rust_driver_pci.rs                   |  2 +-
 14 files changed, 198 insertions(+), 13 deletions(-)
---
base-commit: 8d4031f6a53fe47449b91f30cd7aa5b439558874
change-id: 20251216-nova-unload-4029b3b76950

Best regards,
-- 
Alexandre Courbot <acourbot@...dia.com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ