lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260120204303.3229303-17-joelagnelf@nvidia.com>
Date: Tue, 20 Jan 2026 15:42:53 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: linux-kernel@...r.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	David Airlie <airlied@...il.com>,
	Simona Vetter <simona@...ll.ch>,
	Jonathan Corbet <corbet@....net>,
	Alex Deucher <alexander.deucher@....com>,
	Christian König <christian.koenig@....com>,
	Jani Nikula <jani.nikula@...ux.intel.com>,
	Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
	Rodrigo Vivi <rodrigo.vivi@...el.com>,
	Tvrtko Ursulin <tursulin@...ulin.net>,
	Huang Rui <ray.huang@....com>,
	Matthew Auld <matthew.auld@...el.com>,
	Matthew Brost <matthew.brost@...el.com>,
	Lucas De Marchi <lucas.demarchi@...el.com>,
	Thomas Hellström <thomas.hellstrom@...ux.intel.com>,
	Helge Deller <deller@....de>,
	Danilo Krummrich <dakr@...nel.org>,
	Alice Ryhl <aliceryhl@...gle.com>,
	Miguel Ojeda <ojeda@...nel.org>,
	Alex Gaynor <alex.gaynor@...il.com>,
	Boqun Feng <boqun.feng@...il.com>,
	Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <lossin@...nel.org>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Trevor Gross <tmgross@...ch.edu>,
	John Hubbard <jhubbard@...dia.com>,
	Alistair Popple <apopple@...dia.com>,
	Timur Tabi <ttabi@...dia.com>,
	Edwin Peer <epeer@...dia.com>,
	Alexandre Courbot <acourbot@...dia.com>,
	Andrea Righi <arighi@...dia.com>,
	Andy Ritger <aritger@...dia.com>,
	Zhi Wang <zhiw@...dia.com>,
	Alexey Ivanov <alexeyi@...dia.com>,
	Balbir Singh <balbirs@...dia.com>,
	Philipp Stanner <phasta@...nel.org>,
	Elle Rhumsaa <elle@...thered-steel.dev>,
	Daniel Almeida <daniel.almeida@...labora.com>,
	joel@...lfernandes.org,
	nouveau@...ts.freedesktop.org,
	dri-devel@...ts.freedesktop.org,
	rust-for-linux@...r.kernel.org,
	linux-doc@...r.kernel.org,
	amd-gfx@...ts.freedesktop.org,
	intel-gfx@...ts.freedesktop.org,
	intel-xe@...ts.freedesktop.org,
	linux-fbdev@...r.kernel.org,
	Joel Fernandes <joelagnelf@...dia.com>
Subject: [PATCH RFC v6 16/26] nova-core: mm: Add page table walker for MMU v2

Add the page table walker implementation that traverses the 5-level
page table hierarchy (PDB -> L1 -> L2 -> L3 -> L4) to resolve virtual
addresses to physical addresses or find PTE locations.

The walker provides:
- walk_to_pte_lookup(): Walk existing page tables (no allocation)
- Helper functions for reading/writing PDEs and PTEs via PRAMIN

Uses GpuMm API for centralized access to PRAMIN window.

Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
---
 drivers/gpu/nova-core/mm/pagetable/mod.rs  |  13 +
 drivers/gpu/nova-core/mm/pagetable/walk.rs | 285 +++++++++++++++++++++
 2 files changed, 298 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/walk.rs

diff --git a/drivers/gpu/nova-core/mm/pagetable/mod.rs b/drivers/gpu/nova-core/mm/pagetable/mod.rs
index 72bc7cda8df6..4c77d4953fbd 100644
--- a/drivers/gpu/nova-core/mm/pagetable/mod.rs
+++ b/drivers/gpu/nova-core/mm/pagetable/mod.rs
@@ -9,12 +9,25 @@
 #![expect(dead_code)]
 pub(crate) mod ver2;
 pub(crate) mod ver3;
+pub(crate) mod walk;
 
 use super::{
+    GpuMm,
     Pfn,
     VramAddress, //
 };
 use crate::gpu::Architecture;
+use kernel::prelude::*;
+
+/// Trait for allocating page tables during page table walks.
+///
+/// Implementors must allocate a zeroed 4KB page table in VRAM and
+/// ensure the allocation persists for the lifetime of the address
+/// space and the lifetime of the implementor.
+pub(crate) trait PageTableAllocator {
+    /// Allocate a zeroed page table and return its VRAM address.
+    fn alloc_page_table(&mut self, mm: &mut GpuMm) -> Result<VramAddress>;
+}
 
 /// MMU version enumeration.
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
diff --git a/drivers/gpu/nova-core/mm/pagetable/walk.rs b/drivers/gpu/nova-core/mm/pagetable/walk.rs
new file mode 100644
index 000000000000..7a2660a30d80
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable/walk.rs
@@ -0,0 +1,285 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Page table walker implementation for NVIDIA GPUs.
+//!
+//! This module provides page table walking functionality for MMU v2 (Turing/Ampere/Ada).
+//! The walker traverses the 5-level page table hierarchy (PDB -> L1 -> L2 -> L3 -> L4)
+//! to resolve virtual addresses to physical addresses or to find PTE locations.
+//!
+//! # Page Table Hierarchy
+//!
+//! ```text
+//!     +-------+     +-------+     +-------+     +---------+     +-------+
+//!     | PDB   |---->|  L1   |---->|  L2   |---->| L3 Dual |---->|  L4   |
+//!     | (L0)  |     |       |     |       |     | PDE     |     | (PTE) |
+//!     +-------+     +-------+     +-------+     +---------+     +-------+
+//!       64-bit        64-bit        64-bit        128-bit         64-bit
+//!        PDE           PDE           PDE        (big+small)        PTE
+//! ```
+//!
+//! # Result of a page table walk
+//!
+//! The walker returns a [`WalkResult`] indicating the outcome:
+//! - [`WalkResult::PageTableMissing`]: Intermediate page tables don't exist (lookup mode).
+//! - [`WalkResult::Unmapped`]: PTE exists but is invalid (page not mapped).
+//! - [`WalkResult::Mapped`]: PTE exists and is valid (page is mapped).
+//!
+//! # Example
+//!
+//! ```ignore
+//! use crate::mm::pagetable::walk::{PtWalk, WalkResult};
+//! use crate::mm::GpuMm;
+//!
+//! fn walk_example(mm: &mut GpuMm, pdb_addr: VramAddress) -> Result<()> {
+//!     // Create a page table walker.
+//!     let walker = PtWalk::new(pdb_addr, MmuVersion::V2);
+//!
+//!     // Walk to a PTE (lookup mode).
+//!     match walker.walk_to_pte_lookup(mm, Vfn::new(0x1000))? {
+//!         WalkResult::Mapped { pte_addr, pfn } => {
+//!             // Page is mapped to the physical frame number.
+//!         }
+//!         WalkResult::Unmapped { pte_addr } => {
+//!             // PTE exists but the page is not mapped.
+//!         }
+//!         WalkResult::PageTableMissing => {
+//!             // Intermediate page tables are missing.
+//!         }
+//!     }
+//!
+//!     Ok(())
+//! }
+//! ```
+
+#![allow(dead_code)]
+
+use kernel::prelude::*;
+
+use super::{
+    DualPde,
+    MmuVersion,
+    PageTableAllocator,
+    PageTableLevel,
+    Pde,
+    Pte, //
+};
+use crate::mm::{
+    pramin,
+    GpuMm,
+    Pfn,
+    Vfn,
+    VirtualAddress,
+    VramAddress, //
+};
+
+/// Dummy allocator for lookup-only walks.
+enum NoAlloc {}
+
+impl PageTableAllocator for NoAlloc {
+    fn alloc_page_table(&mut self, _mm: &mut GpuMm) -> Result<VramAddress> {
+        unreachable!()
+    }
+}
+
+/// Result of walking to a PTE.
+#[derive(Debug, Clone, Copy)]
+pub(crate) enum WalkResult {
+    /// Intermediate page tables are missing (only returned in lookup mode).
+    PageTableMissing,
+    /// PTE exists but is invalid (page not mapped).
+    Unmapped { pte_addr: VramAddress },
+    /// PTE exists and is valid (page is mapped).
+    Mapped { pte_addr: VramAddress, pfn: Pfn },
+}
+
+/// Page table walker for NVIDIA GPUs.
+///
+/// Walks the 5-level page table hierarchy to find PTE locations or resolve
+/// virtual addresses.
+pub(crate) struct PtWalk {
+    pdb_addr: VramAddress,
+    mmu_version: MmuVersion,
+}
+
+impl PtWalk {
+    /// Create a new page table walker.
+    ///
+    /// Copies `pdb_addr` and `mmu_version` from VMM configuration.
+    pub(crate) fn new(pdb_addr: VramAddress, mmu_version: MmuVersion) -> Self {
+        Self {
+            pdb_addr,
+            mmu_version,
+        }
+    }
+
+    /// Get the MMU version this walker is configured for.
+    pub(crate) fn mmu_version(&self) -> MmuVersion {
+        self.mmu_version
+    }
+
+    /// Get the Page Directory Base address.
+    pub(crate) fn pdb_addr(&self) -> VramAddress {
+        self.pdb_addr
+    }
+
+    /// Walk to PTE for lookup only (no allocation).
+    ///
+    /// Returns `PageTableMissing` if intermediate tables don't exist.
+    pub(crate) fn walk_to_pte_lookup(&self, mm: &mut GpuMm, vfn: Vfn) -> Result<WalkResult> {
+        self.walk_to_pte_inner::<NoAlloc>(mm, None, vfn)
+    }
+
+    /// Walk to PTE with allocation of missing tables.
+    ///
+    /// Uses `PageTableAllocator::alloc_page_table()` when tables are missing.
+    pub(crate) fn walk_to_pte_allocate<A: PageTableAllocator>(
+        &self,
+        mm: &mut GpuMm,
+        allocator: &mut A,
+        vfn: Vfn,
+    ) -> Result<WalkResult> {
+        self.walk_to_pte_inner(mm, Some(allocator), vfn)
+    }
+
+    /// Internal walk implementation.
+    ///
+    /// If `allocator` is `Some`, allocates missing page tables. Otherwise returns
+    /// `PageTableMissing` when intermediate tables don't exist.
+    fn walk_to_pte_inner<A: PageTableAllocator>(
+        &self,
+        mm: &mut GpuMm,
+        mut allocator: Option<&mut A>,
+        vfn: Vfn,
+    ) -> Result<WalkResult> {
+        let va = VirtualAddress::from(vfn);
+        let mut cur_table = self.pdb_addr;
+
+        // Walk through PDE levels (PDB -> L1 -> L2 -> L3).
+        for level in PageTableLevel::pde_levels() {
+            let idx = va.level_index(level.as_index());
+
+            if level.is_dual_pde_level() {
+                // L3: 128-bit dual PDE. This is the final PDE level before PTEs and uses
+                // a special "dual" format that can point to both a Small Page Table (SPT)
+                // for 4KB pages and a Large Page Table (LPT) for 64KB pages, or encode a
+                // 2MB huge page directly via IS_PTE bit.
+                let dpde_addr = entry_addr(cur_table, level, idx);
+                let dual_pde = read_dual_pde(mm.pramin(), dpde_addr, self.mmu_version)?;
+
+                // Check if SPT (Small Page Table) pointer is present. We use the "small"
+                // path for 4KB pages (only page size currently supported). If missing and
+                // allocator is available, create a new page table; otherwise return
+                // `PageTableMissing` for lookup-only walks.
+                if !dual_pde.has_small() {
+                    if let Some(ref mut a) = allocator {
+                        let new_table = a.alloc_page_table(mm)?;
+                        let new_dual_pde =
+                            DualPde::new_small(self.mmu_version, Pfn::from(new_table));
+                        write_dual_pde(mm.pramin(), dpde_addr, &new_dual_pde)?;
+                        cur_table = new_table;
+                    } else {
+                        return Ok(WalkResult::PageTableMissing);
+                    }
+                } else {
+                    cur_table = dual_pde.small_vram_address();
+                }
+            } else {
+                // Regular 64-bit PDE (levels PDB, L1, L2). Each entry points to the next
+                // level page table.
+                let pde_addr = entry_addr(cur_table, level, idx);
+                let pde = read_pde(mm.pramin(), pde_addr, self.mmu_version)?;
+
+                // Allocate new page table if PDE is invalid and allocator provided,
+                // otherwise return PageTableMissing for lookup-only walks.
+                if !pde.is_valid() {
+                    if let Some(ref mut a) = allocator {
+                        let new_table = a.alloc_page_table(mm)?;
+                        let new_pde = Pde::new_vram(self.mmu_version, Pfn::from(new_table));
+                        write_pde(mm.pramin(), pde_addr, new_pde)?;
+                        cur_table = new_table;
+                    } else {
+                        return Ok(WalkResult::PageTableMissing);
+                    }
+                } else {
+                    cur_table = pde.table_vram_address();
+                }
+            }
+        }
+
+        // Now at L4 (PTE level).
+        let pte_idx = va.level_index(PageTableLevel::L4.as_index());
+        let pte_addr = entry_addr(cur_table, PageTableLevel::L4, pte_idx);
+
+        // Read PTE to check if mapped.
+        let pte = read_pte(mm.pramin(), pte_addr, self.mmu_version)?;
+        if pte.is_valid() {
+            Ok(WalkResult::Mapped {
+                pte_addr,
+                pfn: pte.frame_number(),
+            })
+        } else {
+            Ok(WalkResult::Unmapped { pte_addr })
+        }
+    }
+}
+
+// ====================================
+// Helper functions for accessing VRAM
+// ====================================
+
+/// Calculate the address of an entry within a page table.
+fn entry_addr(table: VramAddress, level: PageTableLevel, index: u64) -> VramAddress {
+    let entry_size = level.entry_size() as u64;
+    VramAddress::new(table.raw() as u64 + index * entry_size)
+}
+
+/// Read a PDE from VRAM.
+pub(crate) fn read_pde(
+    pramin: &mut pramin::Window,
+    addr: VramAddress,
+    mmu_version: MmuVersion,
+) -> Result<Pde> {
+    let val = pramin.try_read64(addr.raw())?;
+    Ok(Pde::new(mmu_version, val))
+}
+
+/// Write a PDE to VRAM.
+pub(crate) fn write_pde(pramin: &mut pramin::Window, addr: VramAddress, pde: Pde) -> Result {
+    pramin.try_write64(addr.raw(), pde.raw_u64())
+}
+
+/// Read a dual PDE (128-bit) from VRAM.
+pub(crate) fn read_dual_pde(
+    pramin: &mut pramin::Window,
+    addr: VramAddress,
+    mmu_version: MmuVersion,
+) -> Result<DualPde> {
+    let lo = pramin.try_read64(addr.raw())?;
+    let hi = pramin.try_read64(addr.raw() + 8)?;
+    Ok(DualPde::new(mmu_version, lo, hi))
+}
+
+/// Write a dual PDE (128-bit) to VRAM.
+pub(crate) fn write_dual_pde(
+    pramin: &mut pramin::Window,
+    addr: VramAddress,
+    dual_pde: &DualPde,
+) -> Result {
+    pramin.try_write64(addr.raw(), dual_pde.big_raw_u64())?;
+    pramin.try_write64(addr.raw() + 8, dual_pde.small_raw_u64())
+}
+
+/// Read a PTE from VRAM.
+pub(crate) fn read_pte(
+    pramin: &mut pramin::Window,
+    addr: VramAddress,
+    mmu_version: MmuVersion,
+) -> Result<Pte> {
+    let val = pramin.try_read64(addr.raw())?;
+    Ok(Pte::new(mmu_version, val))
+}
+
+/// Write a PTE to VRAM.
+pub(crate) fn write_pte(pramin: &mut pramin::Window, addr: VramAddress, pte: Pte) -> Result {
+    pramin.try_write64(addr.raw(), pte.raw_u64())
+}
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ