linux-kernel - Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DD2GCDGQ7Q4W.1CSFYSETFSN5H@nvidia.com>
Date: Fri, 26 Sep 2025 13:37:17 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Alistair Popple" <apopple@...dia.com>,
 <rust-for-linux@...r.kernel.org>, <dri-devel@...ts.freedesktop.org>,
 <dakr@...nel.org>, <acourbot@...dia.com>
Cc: "Miguel Ojeda" <ojeda@...nel.org>, "Alex Gaynor"
 <alex.gaynor@...il.com>, "Boqun Feng" <boqun.feng@...il.com>, "Gary Guo"
 <gary@...yguo.net>, Björn Roy Baron
 <bjorn3_gh@...tonmail.com>, "Benno Lossin" <lossin@...nel.org>, "Andreas
 Hindborg" <a.hindborg@...nel.org>, "Alice Ryhl" <aliceryhl@...gle.com>,
 "Trevor Gross" <tmgross@...ch.edu>, "David Airlie" <airlied@...il.com>,
 "Simona Vetter" <simona@...ll.ch>, "Maarten Lankhorst"
 <maarten.lankhorst@...ux.intel.com>, "Maxime Ripard" <mripard@...nel.org>,
 "Thomas Zimmermann" <tzimmermann@...e.de>, "John Hubbard"
 <jhubbard@...dia.com>, "Joel Fernandes" <joelagnelf@...dia.com>, "Timur
 Tabi" <ttabi@...dia.com>, <linux-kernel@...r.kernel.org>,
 <nouveau@...ts.freedesktop.org>
Subject: Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue
 handling

Hi Alistair,

On Mon Sep 22, 2025 at 8:30 PM JST, Alistair Popple wrote:
<snip>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> new file mode 100644
> index 000000000000..a9ba1a4c73d8
> --- /dev/null
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -0,0 +1,423 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use core::mem::offset_of;
> +use core::sync::atomic::fence;
> +use core::sync::atomic::Ordering;
> +
> +use kernel::alloc::flags::GFP_KERNEL;
> +use kernel::device;
> +use kernel::dma::{CoherentAllocation, DmaAddress};
> +use kernel::prelude::*;
> +use kernel::sync::aref::ARef;
> +use kernel::time::Delta;
> +use kernel::transmute::{AsBytes, FromBytes};
> +use kernel::{dma_read, dma_write};
> +
> +use super::fw::{
> +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE, NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
> +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD, NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
> +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED, NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
> +    NV_VGPU_MSG_EVENT_POST_EVENT, NV_VGPU_MSG_EVENT_RC_TRIGGERED,
> +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT, NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA, NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT, NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA, NV_VGPU_MSG_FUNCTION_FREE,
> +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO, NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
> +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU, NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
> +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO, NV_VGPU_MSG_FUNCTION_LOG,
> +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY, NV_VGPU_MSG_FUNCTION_NOP,
> +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO, NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
> +};

As I mentioned in v1, let's turn these into two enums to avoid this big
import and make sure we can never mix up the values.

This can be something like this in `fw.rs`:

#[repr(u32)]
pub(crate) enum VgpuMsgEvent {
    GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
    GspLockDownNotice = bindings::NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
    ...
}

Then you just need to import `VgpuMsgEvent`, and can use that type where
appropriate, e.g. for the `FUNCTION` associated type of
`GspMessageFromGsp`.

> +use crate::driver::Bar0;
> +use crate::gsp::create_pte_array;
> +use crate::gsp::fw::{GspMsgElement, MsgqRxHeader, MsgqTxHeader};
> +use crate::gsp::{GSP_PAGE_SHIFT, GSP_PAGE_SIZE};
> +use crate::regs::NV_PGSP_QUEUE_HEAD;
> +use crate::sbuffer::SBuffer;
> +use crate::util::wait_on;
> +
> +pub(crate) trait GspCommandToGsp: Sized + FromBytes + AsBytes {
> +    const FUNCTION: u32;
> +}
> +
> +pub(crate) trait GspMessageFromGsp: Sized + FromBytes + AsBytes {
> +    const FUNCTION: u32;
> +}

Do we need to repeat `Gsp` in these trait names? `CommandToGsp` and
`MessageFromGsp` should be fine.

(I am also guilty of prefixing type names to avoid name collisions - a
habit inherited from years of C programming, but since we are already in
the `gsp` module we can forgo this habit as users can just import the
module and refer to the type as `gsp::MessageFromGsp` if there is any
ambiguity).

> +
> +/// Number of GSP pages making the Msgq.
> +pub(crate) const MSGQ_NUM_PAGES: u32 = 0x3f;
> +
> +#[repr(C, align(0x1000))]
> +#[derive(Debug)]
> +struct MsgqData {
> +    data: [[u8; GSP_PAGE_SIZE]; MSGQ_NUM_PAGES as usize],
> +}
> +
> +// Annoyingly there is no real equivalent of #define so we're forced to use a
> +// literal to specify the alignment above. So check that against the actual GSP
> +// page size here.
> +static_assert!(align_of::<MsgqData>() == GSP_PAGE_SIZE);
> +
> +// There is no struct defined for this in the open-gpu-kernel-source headers.
> +// Instead it is defined by code in GspMsgQueuesInit().
> +#[repr(C)]
> +struct Msgq {
> +    tx: MsgqTxHeader,
> +    rx: MsgqRxHeader,
> +    msgq: MsgqData,
> +}
> +
> +#[repr(C)]
> +struct GspMem {
> +    ptes: [u8; GSP_PAGE_SIZE],
> +    cpuq: Msgq,
> +    gspq: Msgq,
> +}
> +
> +// SAFETY: These structs don't meet the no-padding requirements of AsBytes but
> +// that is not a problem because they are not used outside the kernel.
> +unsafe impl AsBytes for GspMem {}
> +
> +// SAFETY: These structs don't meet the no-padding requirements of FromBytes but
> +// that is not a problem because they are not used outside the kernel.
> +unsafe impl FromBytes for GspMem {}

These ARE used outside the kernel, since they are shared with the GSP.
:) I'd say the reason these are safe is just because we satisfy the
requirements of AsBytes and FromBytes:

- No initialized bytes,
- No interior mutability,
- All bytes patterns are valid (for some generous definition of
  "valid" limited to not triggering UB).

> +
> +/// `GspMem` struct that is shared with the GSP.
> +struct DmaGspMem(CoherentAllocation<GspMem>);
> +
> +impl DmaGspMem {
> +    fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
> +        const MSGQ_SIZE: u32 = size_of::<Msgq>() as u32;
> +        const RX_HDR_OFF: u32 = offset_of!(Msgq, rx) as u32;
> +
> +        let mut gsp_mem =
> +            CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
> +        create_pte_array(&mut gsp_mem, 0);
> +        dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF))?;
> +        dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
> +
> +        Ok(Self(gsp_mem))
> +    }
> +
> +    #[expect(unused)]
> +    fn dma_handle(&self) -> DmaAddress {
> +        self.0.dma_handle()
> +    }
> +
> +    /// # Safety
> +    ///
> +    /// The caller must ensure that the device doesn't access the parts of the [`GspMem`] it works
> +    /// with.
> +    unsafe fn access_mut(&mut self) -> &mut GspMem {
> +        // SAFETY:
> +        // - The [`CoherentAllocation`] contains exactly one object.
> +        // - Per the safety statement of the function, no concurrent access wil be performed.
> +        &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0]
> +    }
> +
> +    /// # Safety
> +    ///
> +    /// The caller must ensure that the device doesn't access the parts of the [`GspMem`] it works
> +    /// with.
> +    unsafe fn access(&self) -> &GspMem {
> +        // SAFETY:
> +        // - The [`CoherentAllocation`] contains exactly one object.
> +        // - Per the safety statement of the function, no concurrent access wil be performed.
> +        &unsafe { self.0.as_slice(0, 1) }.unwrap()[0]
> +    }

Since these two methods are only called once each from
`driver_write/read_area`, let's inline them there and reduce our
`unsafe` keyword counter.

> +
> +    fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
> +        let tx = self.cpu_write_ptr() as usize;
> +        let rx = self.gsp_read_ptr() as usize;
> +
> +        // SAFETY: we will only access the driver-owned part of the shared memory.
> +        let gsp_mem = unsafe { self.access_mut() };
> +        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
> +
> +        if rx <= tx {
> +            // The area from `tx` up to the end of the ring, and from the beginning of the ring up
> +            // to `rx`, minus one unit, belongs to the driver.
> +            if rx == 0 {
> +                let last = after_tx.len() - 1;
> +                (&mut after_tx[..last], &mut before_tx[0..0])
> +            } else {
> +                (after_tx, &mut before_tx[..rx])
> +            }
> +        } else {
> +            // The area from `tx` to `rx`, minus one unit, belongs to the driver.
> +            (after_tx.split_at_mut(rx - tx).0, &mut before_tx[0..0])
> +        }
> +    }
> +
> +    fn driver_read_area(&self) -> (&[[u8; GSP_PAGE_SIZE]], &[[u8; GSP_PAGE_SIZE]]) {
> +        let tx = self.gsp_write_ptr() as usize;
> +        let rx = self.cpu_read_ptr() as usize;
> +
> +        // SAFETY: we will only access the driver-owned part of the shared memory.
> +        let gsp_mem = unsafe { self.access() };
> +        let (before_rx, after_rx) = gsp_mem.gspq.msgq.data.split_at(rx);
> +
> +        if tx <= rx {
> +            // The area from `rx` up to the end of the ring, and from the beginning of the ring up
> +            // to `tx`, minus one unit, belongs to the driver.
> +            if tx == 0 {
> +                let last = after_rx.len() - 1;
> +                (&after_rx[..last], &before_rx[0..0])
> +            } else {
> +                (after_rx, &before_rx[..tx])
> +            }
> +        } else {
> +            // The area from `rx` to `tx`, minus one unit, belongs to the driver.
> +            (after_rx.split_at(tx - rx).0, &before_rx[0..0])
> +        }
> +    }

As we discussed offline, this method is incorrect (amongst other things,
it returns the whole ring buffer when it should be empty). Posting my
suggested diff for the record:

--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -152,17 +152,12 @@ unsafe fn access(&self) -> &GspMem {
         let gsp_mem = unsafe { self.access() };
         let (before_rx, after_rx) = gsp_mem.gspq.msgq.data.split_at(rx);

-        if tx <= rx {
+        if tx < rx {
             // The area from `rx` up to the end of the ring, and from the beginning of the ring up
-            // to `tx`, minus one unit, belongs to the driver.
-            if tx == 0 {
-                let last = after_rx.len() - 1;
-                (&after_rx[..last], &before_rx[0..0])
-            } else {
-                (after_rx, &before_rx[..tx])
-            }
+            // to `tx` belongs to the driver.
+            (after_rx, &before_rx[0..tx])
         } else {
-            // The area from `rx` to `tx`, minus one unit, belongs to the driver.
+            // The area from `rx` to `tx` belongs to the driver.
             (after_rx.split_at(tx - rx).0, &before_rx[0..0])
         }
     }

> +
> +    fn gsp_write_ptr(&self) -> u32 {
> +        let gsp_mem = &self.0;
> +        dma_read!(gsp_mem[0].gspq.tx.writePtr).unwrap() % MSGQ_NUM_PAGES
> +    }
> +
> +    fn gsp_read_ptr(&self) -> u32 {
> +        let gsp_mem = &self.0;
> +        dma_read!(gsp_mem[0].gspq.rx.readPtr).unwrap() % MSGQ_NUM_PAGES
> +    }
> +
> +    fn cpu_read_ptr(&self) -> u32 {
> +        let gsp_mem = &self.0;
> +        dma_read!(gsp_mem[0].cpuq.rx.readPtr).unwrap() % MSGQ_NUM_PAGES
> +    }

Maybe add one line of documentation for these. Generally I think we want
a bit more high-level documentation explaining how the ring buffers are
operating.

> +
> +    /// Inform the GSP that it can send `elem_count` new pages into the message queue.
> +    fn advance_cpu_read_ptr(&mut self, elem_count: u32) {
> +        let gsp_mem = &self.0;
> +        let rptr = self.cpu_read_ptr().wrapping_add(elem_count) % MSGQ_NUM_PAGES;
> +
> +        // Ensure read pointer is properly ordered
> +        fence(Ordering::SeqCst);
> +
> +        dma_write!(gsp_mem[0].cpuq.rx.readPtr = rptr).unwrap();
> +    }
> +
> +    fn cpu_write_ptr(&self) -> u32 {
> +        let gsp_mem = &self.0;
> +        dma_read!(gsp_mem[0].cpuq.tx.writePtr).unwrap() % MSGQ_NUM_PAGES
> +    }
> +
> +    /// Inform the GSP that it can process `elem_count` new pages from the command queue.
> +    fn advance_cpu_write_ptr(&mut self, elem_count: u32) {
> +        let gsp_mem = &self.0;
> +        let wptr = self.cpu_write_ptr().wrapping_add(elem_count) & MSGQ_NUM_PAGES;
> +        dma_write!(gsp_mem[0].cpuq.tx.writePtr = wptr).unwrap();
> +
> +        // Ensure all command data is visible before triggering the GSP read
> +        fence(Ordering::SeqCst);
> +    }
> +}
> +
> +pub(crate) struct GspCmdq {

Similar to my previous comment, we can just name this `Cmdq` since we
are already in the `gsp` module.

As a general comment, let's also document our types/methods a bit more,
explaining at least what they are for.

> +    dev: ARef<device::Device>,
> +    seq: u32,
> +    gsp_mem: DmaGspMem,
> +    pub _nr_ptes: u32,
> +}
> +
> +impl GspCmdq {
> +    pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<GspCmdq> {
> +        let gsp_mem = DmaGspMem::new(dev)?;
> +        let nr_ptes = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
> +        build_assert!(nr_ptes * size_of::<u64>() <= GSP_PAGE_SIZE);
> +
> +        Ok(GspCmdq {
> +            dev: dev.into(),
> +            seq: 0,
> +            gsp_mem,
> +            _nr_ptes: nr_ptes as u32,
> +        })
> +    }
> +
> +    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
> +        let sum64 = it
> +            .enumerate()
> +            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
> +            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
> +
> +        ((sum64 >> 32) as u32) ^ (sum64 as u32)
> +    }
> +
> +    #[expect(unused)]
> +    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
> +        &mut self,
> +        bar: &Bar0,
> +        payload_size: usize,
> +        init: impl FnOnce(&mut M, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,

This works pretty well for in-place initialization.

There a couple of small drawbacks though: `M` must implement `FromBytes`
even though we only send it, and (more serious) there is no guarantee
that `init` will fully initialize the command - a forgetful caller can
omit some of its fields, or even the whole structure, and in that case
we will send a command with what happened to be at that position of the
queue at that time.

I think this is a good case for using the `Init` trait: it's like
`PinInit`, but without the `Pin`, and it ensures that the whole argument
is initialized. So this method would change into something like:

    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
        &mut self,
        bar: &Bar0,
        payload_size: usize,
        init: impl Init<M, Error>,
        init_sbuf: impl FnOnce(SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,

This also allows us to drop the `FromBytes` requirement on
`GspCommandToGsp`! But also requires us to use `unsafe` to call
`Init::__init` on the pointer to the command. I think it's worth it
though, as this removes the risk of sending partially-uninitialized
commands.

Then there is what to do with the `SBuffer`. I'd like to think a bit
more about this, as not all commands require it, and those that do
typically send arrays of particular types. I think we should be able to
use the type system to have more control over this, but let's keep that
for the next revision.

> +    ) -> Result {
> +        // TODO: a method that extracts the regions for a given command?
> +        // ... and another that reduces the region to a given number of bytes!
> +        let driver_area = self.gsp_mem.driver_write_area();
> +        let free_tx_pages = driver_area.0.len() + driver_area.1.len();
> +
> +        // Total size of the message, including the headers, command, and optional payload.
> +        let msg_size = size_of::<GspMsgElement>() + size_of::<M>() + payload_size;
> +        if free_tx_pages < msg_size.div_ceil(GSP_PAGE_SIZE) {
> +            return Err(EAGAIN);
> +        }
> +
> +        let (msg_header, cmd, payload_1, payload_2) = {
> +            // TODO: find an alternative to as_flattened_mut()

I think we can use it if we enable the "slice_flatten" feature (stable
since 1.80, introduced in 1.67).

> +            #[allow(clippy::incompatible_msrv)]
> +            let (msg_header_slice, slice_1) = driver_area
> +                .0
> +                .as_flattened_mut()
> +                .split_at_mut(size_of::<GspMsgElement>());
> +            let msg_header = GspMsgElement::from_bytes_mut(msg_header_slice).ok_or(EINVAL)?;
> +            let (cmd_slice, payload_1) = slice_1.split_at_mut(size_of::<M>());
> +            let cmd = M::from_bytes_mut(cmd_slice).ok_or(EINVAL)?;
> +            // TODO: find an alternative to as_flattened_mut()
> +            #[allow(clippy::incompatible_msrv)]
> +            let payload_2 = driver_area.1.as_flattened_mut();
> +            // TODO: Replace this workaround to cut the payload size.
> +            let (payload_1, payload_2) = match payload_size.checked_sub(payload_1.len()) {
> +                // The payload is longer than `payload_1`, set `payload_2` size to the difference.
> +                Some(payload_2_len) => (payload_1, &mut payload_2[..payload_2_len]),
> +                // `payload_1` is longer than the payload, we need to reduce its size.
> +                None => (&mut payload_1[..payload_size], payload_2),
> +            };

We will want (either you or I) to address these TODOs for the next
revision.

> +
> +            (msg_header, cmd, payload_1, payload_2)
> +        };
> +
> +        let sbuffer = SBuffer::new_writer([&mut payload_1[..], &mut payload_2[..]]);
> +        init(cmd, sbuffer)?;
> +
> +        *msg_header = GspMsgElement::new(self.seq, size_of::<M>() + payload_size, M::FUNCTION);
> +        msg_header.checkSum = GspCmdq::calculate_checksum(SBuffer::new_reader([
> +            msg_header.as_bytes(),
> +            cmd.as_bytes(),
> +            payload_1,
> +            payload_2,
> +        ]));
> +
> +        let rpc_header = &msg_header.rpc;
> +        dev_info!(
> +            &self.dev,
> +            "GSP RPC: send: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
> +            self.seq,
> +            rpc_header.function,
> +            decode_gsp_function(rpc_header.function),
> +            rpc_header.length,
> +        );
> +
> +        let elem_count = msg_header.elemCount;
> +        self.seq += 1;
> +        self.gsp_mem.advance_cpu_write_ptr(elem_count);
> +        NV_PGSP_QUEUE_HEAD::default().set_address(0).write(bar);

I'm a bit surprised that we always write `0` here, can we document this
behavior, maybe in the definition of `NV_PGSP_QUEUE_HEAD`?

> +
> +        Ok(())
> +    }
> +
> +    #[expect(unused)]
> +    pub(crate) fn receive_msg_from_gsp<M: GspMessageFromGsp, R>(
> +        &mut self,
> +        timeout: Delta,
> +        init: impl FnOnce(&M, SBuffer<core::array::IntoIter<&[u8], 2>>) -> Result<R>,
> +    ) -> Result<R> {
> +        let (driver_area, msg_header, slice_1) = wait_on(timeout, || {
> +            let driver_area = self.gsp_mem.driver_read_area();
> +            // TODO: find an alternative to as_flattened()
> +            #[allow(clippy::incompatible_msrv)]
> +            let (msg_header_slice, slice_1) = driver_area
> +                .0
> +                .as_flattened()
> +                .split_at(size_of::<GspMsgElement>());

Beware that `split_at` will panic if the slice is shorter than the
passed argument. So we must check here that the driver area is larger
than `GspMsgElement`.

> +
> +            // Can't fail because msg_slice will always be
> +            // size_of::<GspMsgElement>() bytes long by the above split.
> +            let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
> +            if msg_header.rpc.length < size_of::<M>() as u32 {
> +                return None;
> +            }

If we have a message in the queue and this condition doesn't hold, I
don't think we can hope that it will in a further iteration - this
should be an error.

> +
> +            Some((driver_area, msg_header, slice_1))
> +        })?;
> +
> +        let (cmd_slice, payload_1) = slice_1.split_at(size_of::<M>());
> +        let cmd = M::from_bytes(cmd_slice).ok_or(EINVAL)?;
> +        // TODO: find an alternative to as_flattened()
> +        #[allow(clippy::incompatible_msrv)]
> +        let payload_2 = driver_area.1.as_flattened();

There is an issue here - payload_1 and payload_2 cover the *whole* area
that is readable by the driver, not just the first message in the queue.

If there is more than one message pending when this method is called, we
will get a wrong checksum and skip all the following messages. We need
to truncate payload_1/payload_2 to cover the exact area of the first
message only.

> +
> +        // Log RPC receive with message type decoding
> +        dev_info!(
> +            self.dev,
> +            "GSP RPC: receive: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
> +            msg_header.rpc.sequence,
> +            msg_header.rpc.function,
> +            decode_gsp_function(msg_header.rpc.function),
> +            msg_header.rpc.length,
> +        );
> +
> +        if GspCmdq::calculate_checksum(SBuffer::new_reader([
> +            msg_header.as_bytes(),
> +            cmd.as_bytes(),
> +            payload_1,
> +            payload_2,
> +        ])) != 0
> +        {
> +            dev_err!(
> +                self.dev,
> +                "GSP RPC: receive: Call {} - bad checksum",
> +                msg_header.rpc.sequence
> +            );
> +            return Err(EIO);
> +        }
> +
> +        let result = if msg_header.rpc.function == M::FUNCTION {

This should be done way earlier. At this point we have already converted
the command bytes slices into M, which is invalid if it happens that
this condition doesn't hold.

(on a related note, the checksum verification should also be done before
we interpret the message, as a corrupted message could make us cast
`cmd` into something that it is not)

> +            let sbuffer = SBuffer::new_reader([payload_1, payload_2]);
> +            init(cmd, sbuffer)
> +        } else {
> +            Err(ERANGE)
> +        };
> +
> +        self.gsp_mem
> +            .advance_cpu_read_ptr(msg_header.rpc.length.div_ceil(GSP_PAGE_SIZE as u32));

There is a landmine here. `msg_header.rpc.length` contains the length of
the payload, INCLUDING the RPC header itself, but NOT INCLUDING the
remainder of `msg_header`. Therefore the correct statement should be:

    self.gsp_mem.advance_cpu_read_ptr(
        (size_of_val(msg_header) as u32 - size_of_val(&msg_header.rpc) as u32
            + msg_header.rpc.length)
            .div_ceil(GSP_PAGE_SIZE as u32),
    );

Of course, it looks horrible, so we want to hide this member altogether
and provide a nice, well-documented method that returns something that
is immediately useful for us. More on that in `fw.rs`.

(the previous use of `length` in this method is also incorrect).

> +        result
> +    }
> +}
> +
> +fn decode_gsp_function(function: u32) -> &'static str {

Once we have proper enums for the function and events, this can be an
`as_str` method.

> +    match function {
> +        // Common function codes
> +        NV_VGPU_MSG_FUNCTION_NOP => "NOP",
> +        NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO => "SET_GUEST_SYSTEM_INFO",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_ROOT => "ALLOC_ROOT",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE => "ALLOC_DEVICE",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY => "ALLOC_MEMORY",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA => "ALLOC_CTX_DMA",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA => "ALLOC_CHANNEL_DMA",
> +        NV_VGPU_MSG_FUNCTION_MAP_MEMORY => "MAP_MEMORY",
> +        NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA => "BIND_CTX_DMA",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT => "ALLOC_OBJECT",
> +        NV_VGPU_MSG_FUNCTION_FREE => "FREE",
> +        NV_VGPU_MSG_FUNCTION_LOG => "LOG",
> +        NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO => "GET_GSP_STATIC_INFO",
> +        NV_VGPU_MSG_FUNCTION_SET_REGISTRY => "SET_REGISTRY",
> +        NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO => "GSP_SET_SYSTEM_INFO",
> +        NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU => "GSP_INIT_POST_OBJGPU",
> +        NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL => "GSP_RM_CONTROL",
> +        NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO => "GET_STATIC_INFO",
> +
> +        // Event codes
> +        NV_VGPU_MSG_EVENT_GSP_INIT_DONE => "INIT_DONE",
> +        NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER => "RUN_CPU_SEQUENCER",
> +        NV_VGPU_MSG_EVENT_POST_EVENT => "POST_EVENT",
> +        NV_VGPU_MSG_EVENT_RC_TRIGGERED => "RC_TRIGGERED",
> +        NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED => "MMU_FAULT_QUEUED",
> +        NV_VGPU_MSG_EVENT_OS_ERROR_LOG => "OS_ERROR_LOG",
> +        NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD => "NOCAT",
> +        NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE => "LOCKDOWN_NOTICE",
> +        NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT => "LIBOS_PRINT",
> +
> +        // Default for unknown codes
> +        _ => "UNKNOWN",
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 7f4fe684ddaf..2e4255301e58 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -15,7 +15,9 @@
>  use crate::firmware::gsp::GspFirmware;
>  use crate::gpu::Chipset;
>  use crate::gsp;
> +use crate::gsp::cmdq::MSGQ_NUM_PAGES;

I guess the number of pages in the message queue is firmware-dependent,
so would it make sense to move its declaration to this module?

>  use crate::gsp::FbLayout;
> +use crate::gsp::GSP_PAGE_SIZE;
>  
>  /// Dummy type to group methods related to heap parameters for running the GSP firmware.
>  pub(crate) struct GspFwHeapParams(());
> @@ -159,6 +161,37 @@ pub(crate) fn new(gsp_firmware: &GspFirmware, fb_layout: &FbLayout) -> Self {
>      // GSP firmware constants
>      GSP_FW_WPR_META_MAGIC,
>      GSP_FW_WPR_META_REVISION,
> +
> +    // GSP events
> +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
> +    NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
> +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD,
> +    NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
> +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
> +    NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
> +    NV_VGPU_MSG_EVENT_POST_EVENT,
> +    NV_VGPU_MSG_EVENT_RC_TRIGGERED,
> +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
> +
> +    // GSP function calls
> +    NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT,
> +    NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA,
> +    NV_VGPU_MSG_FUNCTION_FREE,
> +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO,
> +    NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
> +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
> +    NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
> +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO,
> +    NV_VGPU_MSG_FUNCTION_LOG,
> +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY,
> +    NV_VGPU_MSG_FUNCTION_NOP,
> +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
> +    NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
>  };
>  
>  #[repr(transparent)]
> @@ -197,3 +230,86 @@ fn id8(name: &str) -> u64 {
>          })
>      }
>  }
> +
> +pub(crate) type MsgqTxHeader = bindings::msgqTxHeader;

This should be wrapped into a newtype that provides the exact set of
features required by the gsp module, like what is done for
`LibosMemoryRegionInitArgument`. For this type we just need two things:
return the `writePtr`, and advance it by a given value. That's all
the parent module needs to see.

By just aliasing this type, we make all its members visible to the `gsp`
module. This increases its dependency on a given firmware version,
carries a risk that the GSP module will mess with what it is not
supposed to, and introduces inconsistency in how we abstract the
firmware types - some are wrapped, others are not. Let's be consistent
and make all bindings completely opaque outside of `fw.rs`.

> +
> +// SAFETY: Padding is explicit and will not contain uninitialized data.
> +unsafe impl AsBytes for MsgqTxHeader {}
> +
> +impl MsgqTxHeader {
> +    pub(crate) fn new(msgq_size: u32, rx_hdr_offset: u32) -> Self {
> +        Self {
> +            version: 0,
> +            size: msgq_size,
> +            msgSize: GSP_PAGE_SIZE as u32,
> +            msgCount: MSGQ_NUM_PAGES,
> +            writePtr: 0,
> +            flags: 1,
> +            rxHdrOff: rx_hdr_offset,
> +            entryOff: GSP_PAGE_SIZE as u32,
> +        }
> +    }
> +}
> +
> +/// RX header for setting up a message queue with the GSP.
> +///
> +/// # Invariants
> +///
> +/// [`Self::read_ptr`] is guaranteed to return a value in the range `0..NUM_PAGES`.
> +pub(crate) type MsgqRxHeader = bindings::msgqRxHeader;
> +
> +// SAFETY: Padding is explicit and will not contain uninitialized data.
> +unsafe impl AsBytes for MsgqRxHeader {}
> +
> +impl MsgqRxHeader {
> +    pub(crate) fn new() -> Self {
> +        Default::default()
> +    }
> +}
> +
> +pub(crate) type GspRpcHeader = bindings::rpc_message_header_v;

This type too is another good illustration of why we should make our
bindings opaque. In `cmdq.rs` we access `GspRpcHeader::length` multiple
times, ignoring the fact that it also includes the size of the RPC
header itself - not just what comes after it! Since it doesn't come with
any documentation, we can be forgiven for assuming the obvious - that it
is just the size of what follows, but it is not.

What we actually want is a method on `GspMsgElement` that returns what
we actually want (and is documented as such): the actual size of the
payload following the whole header. That way there can be no room for
mistake.

> +
> +// SAFETY: Padding is explicit and will not contain uninitialized data.
> +unsafe impl AsBytes for GspRpcHeader {}
> +
> +// SAFETY: This struct only contains integer types for which all bit patterns
> +// are valid.
> +unsafe impl FromBytes for GspRpcHeader {}
> +
> +impl GspRpcHeader {
> +    pub(crate) fn new(cmd_size: u32, function: u32) -> Self {
> +        Self {
> +            // TODO: magic number
> +            header_version: 0x03000000,
> +            signature: bindings::NV_VGPU_MSG_SIGNATURE_VALID,
> +            function,
> +            // TODO: overflow check?
> +            length: size_of::<Self>() as u32 + cmd_size,
> +            rpc_result: 0xffffffff,
> +            rpc_result_private: 0xffffffff,
> +            ..Default::default()
> +        }
> +    }
> +}
> +
> +pub(crate) type GspMsgElement = bindings::GSP_MSG_QUEUE_ELEMENT;

Hammering my previous argument a bit more: in `cmdq.rs`, we do the
following:

    let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();

Even though we explicitly used `GspMsgElement`, `msg_header` appears as
being of type `GSP_MSG_QUEUE_ELEMENT` in LSP. That's super confusing and
can be avoided by using the newtype pattern.

Lastly, the bindings generated from C headers are supposed to be temporary, and
we eventually want to replace them with direct IDL-to-Rust bindings. Not
leaking the C types let us design `fw.rs` as a blueprint for the ideal
interface we would want to generate - so the limited extra labor that
comes with wrapping these types would also pay off in that respect.