[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260120204303.3229303-17-joelagnelf@nvidia.com>
Date: Tue, 20 Jan 2026 15:42:53 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: linux-kernel@...r.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>,
David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>,
Jonathan Corbet <corbet@....net>,
Alex Deucher <alexander.deucher@....com>,
Christian König <christian.koenig@....com>,
Jani Nikula <jani.nikula@...ux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@...el.com>,
Tvrtko Ursulin <tursulin@...ulin.net>,
Huang Rui <ray.huang@....com>,
Matthew Auld <matthew.auld@...el.com>,
Matthew Brost <matthew.brost@...el.com>,
Lucas De Marchi <lucas.demarchi@...el.com>,
Thomas Hellström <thomas.hellstrom@...ux.intel.com>,
Helge Deller <deller@....de>,
Danilo Krummrich <dakr@...nel.org>,
Alice Ryhl <aliceryhl@...gle.com>,
Miguel Ojeda <ojeda@...nel.org>,
Alex Gaynor <alex.gaynor@...il.com>,
Boqun Feng <boqun.feng@...il.com>,
Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Benno Lossin <lossin@...nel.org>,
Andreas Hindborg <a.hindborg@...nel.org>,
Trevor Gross <tmgross@...ch.edu>,
John Hubbard <jhubbard@...dia.com>,
Alistair Popple <apopple@...dia.com>,
Timur Tabi <ttabi@...dia.com>,
Edwin Peer <epeer@...dia.com>,
Alexandre Courbot <acourbot@...dia.com>,
Andrea Righi <arighi@...dia.com>,
Andy Ritger <aritger@...dia.com>,
Zhi Wang <zhiw@...dia.com>,
Alexey Ivanov <alexeyi@...dia.com>,
Balbir Singh <balbirs@...dia.com>,
Philipp Stanner <phasta@...nel.org>,
Elle Rhumsaa <elle@...thered-steel.dev>,
Daniel Almeida <daniel.almeida@...labora.com>,
joel@...lfernandes.org,
nouveau@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org,
rust-for-linux@...r.kernel.org,
linux-doc@...r.kernel.org,
amd-gfx@...ts.freedesktop.org,
intel-gfx@...ts.freedesktop.org,
intel-xe@...ts.freedesktop.org,
linux-fbdev@...r.kernel.org,
Joel Fernandes <joelagnelf@...dia.com>
Subject: [PATCH RFC v6 16/26] nova-core: mm: Add page table walker for MMU v2
Add the page table walker implementation that traverses the 5-level
page table hierarchy (PDB -> L1 -> L2 -> L3 -> L4) to resolve virtual
addresses to physical addresses or find PTE locations.
The walker provides:
- walk_to_pte_lookup(): Walk existing page tables (no allocation)
- Helper functions for reading/writing PDEs and PTEs via PRAMIN
Uses GpuMm API for centralized access to PRAMIN window.
Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
---
drivers/gpu/nova-core/mm/pagetable/mod.rs | 13 +
drivers/gpu/nova-core/mm/pagetable/walk.rs | 285 +++++++++++++++++++++
2 files changed, 298 insertions(+)
create mode 100644 drivers/gpu/nova-core/mm/pagetable/walk.rs
diff --git a/drivers/gpu/nova-core/mm/pagetable/mod.rs b/drivers/gpu/nova-core/mm/pagetable/mod.rs
index 72bc7cda8df6..4c77d4953fbd 100644
--- a/drivers/gpu/nova-core/mm/pagetable/mod.rs
+++ b/drivers/gpu/nova-core/mm/pagetable/mod.rs
@@ -9,12 +9,25 @@
#![expect(dead_code)]
pub(crate) mod ver2;
pub(crate) mod ver3;
+pub(crate) mod walk;
use super::{
+ GpuMm,
Pfn,
VramAddress, //
};
use crate::gpu::Architecture;
+use kernel::prelude::*;
+
+/// Trait for allocating page tables during page table walks.
+///
+/// Implementors must allocate a zeroed 4KB page table in VRAM and
+/// ensure the allocation persists for the lifetime of the address
+/// space and the lifetime of the implementor.
+pub(crate) trait PageTableAllocator {
+ /// Allocate a zeroed page table and return its VRAM address.
+ fn alloc_page_table(&mut self, mm: &mut GpuMm) -> Result<VramAddress>;
+}
/// MMU version enumeration.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
diff --git a/drivers/gpu/nova-core/mm/pagetable/walk.rs b/drivers/gpu/nova-core/mm/pagetable/walk.rs
new file mode 100644
index 000000000000..7a2660a30d80
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable/walk.rs
@@ -0,0 +1,285 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Page table walker implementation for NVIDIA GPUs.
+//!
+//! This module provides page table walking functionality for MMU v2 (Turing/Ampere/Ada).
+//! The walker traverses the 5-level page table hierarchy (PDB -> L1 -> L2 -> L3 -> L4)
+//! to resolve virtual addresses to physical addresses or to find PTE locations.
+//!
+//! # Page Table Hierarchy
+//!
+//! ```text
+//! +-------+ +-------+ +-------+ +---------+ +-------+
+//! | PDB |---->| L1 |---->| L2 |---->| L3 Dual |---->| L4 |
+//! | (L0) | | | | | | PDE | | (PTE) |
+//! +-------+ +-------+ +-------+ +---------+ +-------+
+//! 64-bit 64-bit 64-bit 128-bit 64-bit
+//! PDE PDE PDE (big+small) PTE
+//! ```
+//!
+//! # Result of a page table walk
+//!
+//! The walker returns a [`WalkResult`] indicating the outcome:
+//! - [`WalkResult::PageTableMissing`]: Intermediate page tables don't exist (lookup mode).
+//! - [`WalkResult::Unmapped`]: PTE exists but is invalid (page not mapped).
+//! - [`WalkResult::Mapped`]: PTE exists and is valid (page is mapped).
+//!
+//! # Example
+//!
+//! ```ignore
+//! use crate::mm::pagetable::walk::{PtWalk, WalkResult};
+//! use crate::mm::GpuMm;
+//!
+//! fn walk_example(mm: &mut GpuMm, pdb_addr: VramAddress) -> Result<()> {
+//! // Create a page table walker.
+//! let walker = PtWalk::new(pdb_addr, MmuVersion::V2);
+//!
+//! // Walk to a PTE (lookup mode).
+//! match walker.walk_to_pte_lookup(mm, Vfn::new(0x1000))? {
+//! WalkResult::Mapped { pte_addr, pfn } => {
+//! // Page is mapped to the physical frame number.
+//! }
+//! WalkResult::Unmapped { pte_addr } => {
+//! // PTE exists but the page is not mapped.
+//! }
+//! WalkResult::PageTableMissing => {
+//! // Intermediate page tables are missing.
+//! }
+//! }
+//!
+//! Ok(())
+//! }
+//! ```
+
+#![allow(dead_code)]
+
+use kernel::prelude::*;
+
+use super::{
+ DualPde,
+ MmuVersion,
+ PageTableAllocator,
+ PageTableLevel,
+ Pde,
+ Pte, //
+};
+use crate::mm::{
+ pramin,
+ GpuMm,
+ Pfn,
+ Vfn,
+ VirtualAddress,
+ VramAddress, //
+};
+
+/// Dummy allocator for lookup-only walks.
+enum NoAlloc {}
+
+impl PageTableAllocator for NoAlloc {
+ fn alloc_page_table(&mut self, _mm: &mut GpuMm) -> Result<VramAddress> {
+ unreachable!()
+ }
+}
+
+/// Result of walking to a PTE.
+#[derive(Debug, Clone, Copy)]
+pub(crate) enum WalkResult {
+ /// Intermediate page tables are missing (only returned in lookup mode).
+ PageTableMissing,
+ /// PTE exists but is invalid (page not mapped).
+ Unmapped { pte_addr: VramAddress },
+ /// PTE exists and is valid (page is mapped).
+ Mapped { pte_addr: VramAddress, pfn: Pfn },
+}
+
+/// Page table walker for NVIDIA GPUs.
+///
+/// Walks the 5-level page table hierarchy to find PTE locations or resolve
+/// virtual addresses.
+pub(crate) struct PtWalk {
+ pdb_addr: VramAddress,
+ mmu_version: MmuVersion,
+}
+
+impl PtWalk {
+ /// Create a new page table walker.
+ ///
+ /// Copies `pdb_addr` and `mmu_version` from VMM configuration.
+ pub(crate) fn new(pdb_addr: VramAddress, mmu_version: MmuVersion) -> Self {
+ Self {
+ pdb_addr,
+ mmu_version,
+ }
+ }
+
+ /// Get the MMU version this walker is configured for.
+ pub(crate) fn mmu_version(&self) -> MmuVersion {
+ self.mmu_version
+ }
+
+ /// Get the Page Directory Base address.
+ pub(crate) fn pdb_addr(&self) -> VramAddress {
+ self.pdb_addr
+ }
+
+ /// Walk to PTE for lookup only (no allocation).
+ ///
+ /// Returns `PageTableMissing` if intermediate tables don't exist.
+ pub(crate) fn walk_to_pte_lookup(&self, mm: &mut GpuMm, vfn: Vfn) -> Result<WalkResult> {
+ self.walk_to_pte_inner::<NoAlloc>(mm, None, vfn)
+ }
+
+ /// Walk to PTE with allocation of missing tables.
+ ///
+ /// Uses `PageTableAllocator::alloc_page_table()` when tables are missing.
+ pub(crate) fn walk_to_pte_allocate<A: PageTableAllocator>(
+ &self,
+ mm: &mut GpuMm,
+ allocator: &mut A,
+ vfn: Vfn,
+ ) -> Result<WalkResult> {
+ self.walk_to_pte_inner(mm, Some(allocator), vfn)
+ }
+
+ /// Internal walk implementation.
+ ///
+ /// If `allocator` is `Some`, allocates missing page tables. Otherwise returns
+ /// `PageTableMissing` when intermediate tables don't exist.
+ fn walk_to_pte_inner<A: PageTableAllocator>(
+ &self,
+ mm: &mut GpuMm,
+ mut allocator: Option<&mut A>,
+ vfn: Vfn,
+ ) -> Result<WalkResult> {
+ let va = VirtualAddress::from(vfn);
+ let mut cur_table = self.pdb_addr;
+
+ // Walk through PDE levels (PDB -> L1 -> L2 -> L3).
+ for level in PageTableLevel::pde_levels() {
+ let idx = va.level_index(level.as_index());
+
+ if level.is_dual_pde_level() {
+ // L3: 128-bit dual PDE. This is the final PDE level before PTEs and uses
+ // a special "dual" format that can point to both a Small Page Table (SPT)
+ // for 4KB pages and a Large Page Table (LPT) for 64KB pages, or encode a
+ // 2MB huge page directly via IS_PTE bit.
+ let dpde_addr = entry_addr(cur_table, level, idx);
+ let dual_pde = read_dual_pde(mm.pramin(), dpde_addr, self.mmu_version)?;
+
+ // Check if SPT (Small Page Table) pointer is present. We use the "small"
+ // path for 4KB pages (only page size currently supported). If missing and
+ // allocator is available, create a new page table; otherwise return
+ // `PageTableMissing` for lookup-only walks.
+ if !dual_pde.has_small() {
+ if let Some(ref mut a) = allocator {
+ let new_table = a.alloc_page_table(mm)?;
+ let new_dual_pde =
+ DualPde::new_small(self.mmu_version, Pfn::from(new_table));
+ write_dual_pde(mm.pramin(), dpde_addr, &new_dual_pde)?;
+ cur_table = new_table;
+ } else {
+ return Ok(WalkResult::PageTableMissing);
+ }
+ } else {
+ cur_table = dual_pde.small_vram_address();
+ }
+ } else {
+ // Regular 64-bit PDE (levels PDB, L1, L2). Each entry points to the next
+ // level page table.
+ let pde_addr = entry_addr(cur_table, level, idx);
+ let pde = read_pde(mm.pramin(), pde_addr, self.mmu_version)?;
+
+ // Allocate new page table if PDE is invalid and allocator provided,
+ // otherwise return PageTableMissing for lookup-only walks.
+ if !pde.is_valid() {
+ if let Some(ref mut a) = allocator {
+ let new_table = a.alloc_page_table(mm)?;
+ let new_pde = Pde::new_vram(self.mmu_version, Pfn::from(new_table));
+ write_pde(mm.pramin(), pde_addr, new_pde)?;
+ cur_table = new_table;
+ } else {
+ return Ok(WalkResult::PageTableMissing);
+ }
+ } else {
+ cur_table = pde.table_vram_address();
+ }
+ }
+ }
+
+ // Now at L4 (PTE level).
+ let pte_idx = va.level_index(PageTableLevel::L4.as_index());
+ let pte_addr = entry_addr(cur_table, PageTableLevel::L4, pte_idx);
+
+ // Read PTE to check if mapped.
+ let pte = read_pte(mm.pramin(), pte_addr, self.mmu_version)?;
+ if pte.is_valid() {
+ Ok(WalkResult::Mapped {
+ pte_addr,
+ pfn: pte.frame_number(),
+ })
+ } else {
+ Ok(WalkResult::Unmapped { pte_addr })
+ }
+ }
+}
+
+// ====================================
+// Helper functions for accessing VRAM
+// ====================================
+
+/// Calculate the address of an entry within a page table.
+fn entry_addr(table: VramAddress, level: PageTableLevel, index: u64) -> VramAddress {
+ let entry_size = level.entry_size() as u64;
+ VramAddress::new(table.raw() as u64 + index * entry_size)
+}
+
+/// Read a PDE from VRAM.
+pub(crate) fn read_pde(
+ pramin: &mut pramin::Window,
+ addr: VramAddress,
+ mmu_version: MmuVersion,
+) -> Result<Pde> {
+ let val = pramin.try_read64(addr.raw())?;
+ Ok(Pde::new(mmu_version, val))
+}
+
+/// Write a PDE to VRAM.
+pub(crate) fn write_pde(pramin: &mut pramin::Window, addr: VramAddress, pde: Pde) -> Result {
+ pramin.try_write64(addr.raw(), pde.raw_u64())
+}
+
+/// Read a dual PDE (128-bit) from VRAM.
+pub(crate) fn read_dual_pde(
+ pramin: &mut pramin::Window,
+ addr: VramAddress,
+ mmu_version: MmuVersion,
+) -> Result<DualPde> {
+ let lo = pramin.try_read64(addr.raw())?;
+ let hi = pramin.try_read64(addr.raw() + 8)?;
+ Ok(DualPde::new(mmu_version, lo, hi))
+}
+
+/// Write a dual PDE (128-bit) to VRAM.
+pub(crate) fn write_dual_pde(
+ pramin: &mut pramin::Window,
+ addr: VramAddress,
+ dual_pde: &DualPde,
+) -> Result {
+ pramin.try_write64(addr.raw(), dual_pde.big_raw_u64())?;
+ pramin.try_write64(addr.raw() + 8, dual_pde.small_raw_u64())
+}
+
+/// Read a PTE from VRAM.
+pub(crate) fn read_pte(
+ pramin: &mut pramin::Window,
+ addr: VramAddress,
+ mmu_version: MmuVersion,
+) -> Result<Pte> {
+ let val = pramin.try_read64(addr.raw())?;
+ Ok(Pte::new(mmu_version, val))
+}
+
+/// Write a PTE to VRAM.
+pub(crate) fn write_pte(pramin: &mut pramin::Window, addr: VramAddress, pte: Pte) -> Result {
+ pramin.try_write64(addr.raw(), pte.raw_u64())
+}
--
2.34.1
Powered by blists - more mailing lists