lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DDQIUZBMK5D1.7YYKEIN6HPNU@nvidia.com>
Date: Fri, 24 Oct 2025 20:39:36 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Joel Fernandes" <joelagnelf@...dia.com>, "Alexandre Courbot"
 <acourbot@...dia.com>, <linux-kernel@...r.kernel.org>,
 <rust-for-linux@...r.kernel.org>, <dri-devel@...ts.freedesktop.org>,
 <dakr@...nel.org>
Cc: "Alistair Popple" <apopple@...dia.com>, "Miguel Ojeda"
 <ojeda@...nel.org>, "Alex Gaynor" <alex.gaynor@...il.com>, "Boqun Feng"
 <boqun.feng@...il.com>, "Gary Guo" <gary@...yguo.net>,
 <bjorn3_gh@...tonmail.com>, "Benno Lossin" <lossin@...nel.org>, "Andreas
 Hindborg" <a.hindborg@...nel.org>, "Alice Ryhl" <aliceryhl@...gle.com>,
 "Trevor Gross" <tmgross@...ch.edu>, "David Airlie" <airlied@...il.com>,
 "Simona Vetter" <simona@...ll.ch>, "Maarten Lankhorst"
 <maarten.lankhorst@...ux.intel.com>, "Maxime Ripard" <mripard@...nel.org>,
 "Thomas Zimmermann" <tzimmermann@...e.de>, "John Hubbard"
 <jhubbard@...dia.com>, "Timur Tabi" <ttabi@...dia.com>,
 <joel@...lfernandes.org>, "Elle Rhumsaa" <elle@...thered-steel.dev>,
 "Daniel Almeida" <daniel.almeida@...labora.com>,
 <nouveau@...ts.freedesktop.org>
Subject: Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to
 write to VRAM

On Thu Oct 23, 2025 at 7:04 AM JST, Joel Fernandes wrote:
> Hi Alex,
>
> On 10/22/2025 6:41 AM, Alexandre Courbot wrote:
>> On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
>>> Required for writing page tables directly to VRAM physical memory,
>>> before page tables and MMU are setup.
>>>
>>> Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
>>> ---
>>>  drivers/gpu/nova-core/mm/mod.rs    |   3 +
>>>  drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
>>>  drivers/gpu/nova-core/nova_core.rs |   1 +
>>>  drivers/gpu/nova-core/regs.rs      |  29 +++-
>>>  4 files changed, 273 insertions(+), 1 deletion(-)
>>>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>>>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
>>>
>>> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
>>> new file mode 100644
>>> index 000000000000..54c7cd9416a9
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/mm/mod.rs
>>> @@ -0,0 +1,3 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +pub(crate) mod pramin;
>>> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
>>> new file mode 100644
>>> index 000000000000..4f4e1b8c0b9b
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/mm/pramin.rs
>>> @@ -0,0 +1,241 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Direct VRAM access through PRAMIN window before page tables are set up.
>>> +//! PRAMIN can also write to system memory, however for simplicty we only
>> 
>> s/simplicty/simplicity
>
> Ok.
>
>>> +//! support VRAM access.
>>> +//!
>>> +//! # Examples
>>> +//!
>>> +//! ## Writing u32 data to VRAM
>>> +//!
>>> +//! ```no_run
>>> +//! use crate::driver::Bar0;
>>> +//! use crate::mm::pramin::PraminVram;
>>> +//!
>>> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
>>> +//!     let pramin = PraminVram::new(bar);
>>> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
>>> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
>>> +//!     pramin.write::<u32>(0x10000, &data)?;
>>> +//!     Ok(())
>>> +//! }
>>> +//! ```
>>> +//!
>>> +//! ## Reading bytes from VRAM
>>> +//!
>>> +//! ```no_run
>>> +//! use crate::driver::Bar0;
>>> +//! use crate::mm::pramin::PraminVram;
>>> +//!
>>> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
>>> +//!     let pramin = PraminVram::new(bar);
>>> +//!     // Read a u8 from VRAM starting at offset 0x20000
>>> +//!     pramin.read::<u8>(0x20000, buffer)?;
>>> +//!     Ok(())
>>> +//! }
>>> +//! ```
>>> +
>>> +#![expect(dead_code)]
>>> +
>>> +use crate::driver::Bar0;
>>> +use crate::regs;
>>> +use core::mem;
>>> +use kernel::prelude::*;
>>> +
>>> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
>>> +/// the VRAM directly. These addresses are consistent across all GPUs.
>>> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
>> 
>> This definition looks like it could be an array of registers - that way
>> we could use its `BASE` associated constant and keep the hardware
>> offsets into the `regs` module.
>> 
>> Even if we don't use the array of registers for convenience, it is good
>> to have it defined in `regs` for consistency.
>
> Ok, I wanted to do that, but I thought since these are registers, it is weird to
> move it there.

It's just that it looks like to keep all the layout in the same place.

>
> Also we need byte-level access, register macro is u32. I don't think we should
> overload regs.rs just to store magic numbers, these are not registers right? We
> have PRAM window configuration registers but that's different.

The register macro just gained support for `u8` thanks to your work, I
thought that would have been a good use-case. :)

Also we don't need to use the register accessors for this - the idea was
just to have the definition in `regs.rs`, and use the constant
containing its address instead of the regular register accessors if they
are not fit for this.

I don't feel too strongly about this though.

>
>> 
>>> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
>> 
>> You can use `kernel::sizes::SZ_1M` here.
>
> Sure, will do.
>
>>> +
>>> +/// Trait for types that can be read/written through PRAMIN.
>>> +pub(crate) trait PraminNum: Copy + Default + Sized {
>>> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
>>> +
>>> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
>>> +
>>> +    fn size_bytes() -> usize {
>>> +        mem::size_of::<Self>()
>>> +    }
>>> +
>>> +    fn alignment() -> usize {
>>> +        Self::size_bytes()
>>> +    }
>>> +}
>> 
>> Since this trait requires `Sized`, you can use `size_of` and `align_of`
>> directly, making the `size_bytes` and `alignment` methods redundant.
>> Only `write_to_bar` should remain.
>
> Sure, slightly poorer caller-side readability though but its fine with me, I'll
> do that.
>
>> I also wonder whether we couldn't get rid of this trait entirely by
>> leveragin `FromBytes` and `AsBytes`. Since the size of the type is
>> known, we could have read/write methods in Pramin that write its content
>> by using Io accessors of decreasing size (first 64-bit, then 32, etc)
>> until all the data is written.
>
> Ah great idea, I like this. Though per the other discussion with John on keeping
> it simple (not doing bulk I/O operations), maybe we wouldn't need a trait at
> all. Let me see.

Even better. :)

>
>> 
>>> +
>>> +/// Macro to implement PraminNum trait for unsigned integer types.
>>> +macro_rules! impl_pramin_unsigned_num {
>>> +    ($bits:literal) => {
>>> +        ::kernel::macros::paste! {
>>> +            impl PraminNum for [<u $bits>] {
>>> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
>>> +                    bar.[<try_read $bits>](offset)
>>> +                }
>>> +
>>> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
>>> +                    bar.[<try_write $bits>](self, offset)
>>> +                }
>>> +            }
>>> +        }
>>> +    };
>>> +}
>>> +
>>> +impl_pramin_unsigned_num!(8);
>>> +impl_pramin_unsigned_num!(16);
>>> +impl_pramin_unsigned_num!(32);
>>> +impl_pramin_unsigned_num!(64);
>>> +
>>> +/// Direct VRAM access through PRAMIN window before page tables are set up.
>>> +pub(crate) struct PraminVram<'a> {
>> 
>> Let's use the shorter name `Pramin` - the limitation to VRAM is a
>> reasonable one (since the CPU can access its own system memory), it is
>> not necessary to encode it into the name.
> Sure, sounds good.
>
>> 
>>> +    bar: &'a Bar0,
>>> +    saved_window_addr: usize,
>>> +}
>>> +
>>> +impl<'a> PraminVram<'a> {
>>> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
>>> +    /// the state is restored when the accessor is dropped.
>>> +    ///
>>> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
>>> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
>>> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
>>> +        let saved_window_addr = Self::get_window_addr(bar);
>>> +        Self {
>>> +            bar,
>>> +            saved_window_addr,
>>> +        }
>>> +    }
>>> +
>>> +    /// Set BAR0 window to point to specific FB region.
>>> +    ///
>>> +    /// # Arguments
>>> +    ///
>>> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
>>> +    ///                 Must be 64KB aligned (lower 16 bits zero).
>> 
>> Let's follow the rust doccomment guidelines for the arguments.
>
> Ok, Sure.
>> 
>>> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
>>> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
>>> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
>>> +        if fb_offset & 0xFFFF != 0 {
>>> +            return Err(EINVAL);
>>> +        }
>> 
>> Since this method is private and called from controlled contexts for
>> which `fb_offset` should always be valid, we can request callers to
>> give us a "window index" (e.g. the `window_base` of the
>> `NV_PBUS_BAR0_WINDOW` register) directly and remove this check. That
>> will also let us remove the impl block on `NV_PBUS_BAR0_WINDOW`.
>> 
>
> The tradeoff being it may complicated callers of the function that deal purely
> with addresses instead of window indices.

Would it though? IIUC the two callers of this method are aligning the
address already, so the internal check is superfluous. And this would
also simplify `NV_PBUS_BAR0_WINDOW`.

>
>>> +    ///
>>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>>> +    /// data that spans multiple 1MB windows.
>> 
>> Inversely, this large method is under-documented. Understanding what
>> `operation` is supposed to do would be helpful.
>
> I will skip these comments for now as we discussed dropping complexity in other
> thread, but thanks for the review on this. This function should be likely
> dropped in the next iteration.
>
>>> +
>>> +    /// Sets the window address from a framebuffer offset.
>>> +    /// The fb_offset must be 64KB aligned (lower bits discared).
>>> +    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
>>> +        // Calculate window base (bits 39:16 of FB address)
>>> +        // The total FB address is 40 bits, mask anything above. Since we are
>>> +        // right shifting the offset by 16 bits, the mask is only 24 bits.
>>> +        let mask = genmask_u32(0..=23) as usize;
>>> +        let window_base = ((fb_offset >> 16) & mask) as u32;
>>> +        self.set_window_base(window_base)
>>> +    }
>>> +}
>> 
>> If you work directly with `window_base` as suggested above, this impl
>> block can be dropped altogether.
> But it will complicate callers. That's the tradeoff. I prefer to keep caller
> side simple and abstract away complexity. But to your point, this is an internal
> API so I can probably code it both ways and see what it looks like.

I am not convinced that the callers would require extra complexity. If
it turns out they do, let's weight the cost/benefit of both approaches.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ