linux-kernel - Re: [PATCH v5 07/14] gpu: nova-core: gsp: Add GSP command queue handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <uzdfugjo3m5j73wczfbauddatvgzx5ivhimlt5yxgd6vs5ex54@abm4asouc2pb>
Date: Fri, 17 Oct 2025 11:36:57 +1100
From: Alistair Popple <apopple@...dia.com>
To: Alexandre Courbot <acourbot@...dia.com>
Cc: rust-for-linux@...r.kernel.org, dri-devel@...ts.freedesktop.org, 
	dakr@...nel.org, Miguel Ojeda <ojeda@...nel.org>, 
	Alex Gaynor <alex.gaynor@...il.com>, Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, 
	Björn Roy Baron <bjorn3_gh@...tonmail.com>, Benno Lossin <lossin@...nel.org>, 
	Andreas Hindborg <a.hindborg@...nel.org>, Alice Ryhl <aliceryhl@...gle.com>, 
	Trevor Gross <tmgross@...ch.edu>, David Airlie <airlied@...il.com>, 
	Simona Vetter <simona@...ll.ch>, Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, 
	Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, 
	John Hubbard <jhubbard@...dia.com>, Joel Fernandes <joelagnelf@...dia.com>, 
	Timur Tabi <ttabi@...dia.com>, linux-kernel@...r.kernel.org, nouveau@...ts.freedesktop.org
Subject: Re: [PATCH v5 07/14] gpu: nova-core: gsp: Add GSP command queue
 handling

On 2025-10-16 at 17:24 +1100, Alexandre Courbot <acourbot@...dia.com> wrote...
> On Mon Oct 13, 2025 at 3:20 PM JST, Alistair Popple wrote:
> > This commit introduces core infrastructure for handling GSP command and
> > message queues in the nova-core driver. The command queue system enables
> > bidirectional communication between the host driver and GSP firmware
> > through a remote message passing interface.
> >
> > The interface is based on passing serialised data structures over a ring
> > buffer with separate transmit and receive queues. Commands are sent by
> > writing to the CPU transmit queue and waiting for completion via the
> > receive queue.
> >
> > To ensure safety mutable or immutable (depending on whether it is a send
> > or receive operation) references are taken on the command queue when
> > allocating the message to write/read to. This ensures message memory
> > remains valid and the command queue can't be mutated whilst an operation
> > is in progress.
> >
> > Currently this is only used by the probe() routine and therefore can
> > only used by a single thread of execution. Locking to enable safe access
> > from multiple threads will be introduced in a future series when that
> > becomes necessary.
> >
> > Signed-off-by: Alistair Popple <apopple@...dia.com>
> >
> > ---
> >
> > Changes for v4:
> >  - Use read_poll_timeout() instead of wait_on()
> >  - Switch to using `init!` (Thanks Alex for figuring out the
> >    required workarounds)
> >  - Pass the enum type into the RPC bindings instead of a raw u32
> >  - Fixup the TODOs for extracting/allocating the send command regions
> >  - Split the sending functions into one taking just a command struct and
> >    another taking a command struct with extra payload
> >
> > Changes for v3:
> >  - Reduce the receive payloads to the correct size
> >  - Use opaque bindings
> >  - Clean up of the command queue PTE creation
> >  - Add an enum for the GSP functions
> >  - Rename GspCommandToGsp and GspMessageFromGsp
> >  - Rename GspCmdq
> >  - Add function to notify GSP of updated queue pointers
> >  - Inline driver area access functions
> >  - Fixup receive area calculations
> >
> > Changes for v2:
> >  - Rebased on Alex's latest series
> > ---
> >  drivers/gpu/nova-core/gsp.rs      |   9 +
> >  drivers/gpu/nova-core/gsp/cmdq.rs | 493 ++++++++++++++++++++++++++++++
> >  drivers/gpu/nova-core/regs.rs     |   4 +
> >  drivers/gpu/nova-core/sbuffer.rs  |   2 -
> >  scripts/Makefile.build            |   2 +-
> >  5 files changed, 507 insertions(+), 3 deletions(-)
> >  create mode 100644 drivers/gpu/nova-core/gsp/cmdq.rs
> >

[...]

> > +use crate::sbuffer::SBufferIter;
> > +
> > +pub(crate) trait CommandToGsp: Sized + FromBytes + AsBytes {
> > +    const FUNCTION: MsgFunction;
> > +}
> > +
> > +pub(crate) trait CommandToGspWithPayload: CommandToGsp {}
> 
> Unfortunately this hierarchy won't work, because this way a command that
> is expected to take a payload can be sent without any.
> 
> I'd suggest instead having a `CommandToGspBase` trait that sets the
> FUNCTION, and have `CommandToGsp` and `CommandToGspWithPayload` require
> it.

Ok.

> > +
> > +pub(crate) trait MessageFromGsp: Sized + FromBytes + AsBytes {
> > +    const FUNCTION: MsgFunction;
> > +}
> 
> Just a bit of documentation for these new traits please. :)
> 
> (also applies to all undocumented types/methods in this file)

Sure.

> <snip>
> > +impl DmaGspMem {
> > +    fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
> > +        const MSGQ_SIZE: u32 = size_of::<Msgq>() as u32;
> > +        const RX_HDR_OFF: u32 = offset_of!(Msgq, rx) as u32;
> > +
> > +        let gsp_mem =
> > +            CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
> > +        dma_write!(gsp_mem[0].ptes = PteArray::new(gsp_mem.dma_handle())?)?;
> > +        dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF, MSGQ_NUM_PAGES))?;
> > +        dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
> > +
> > +        Ok(Self(gsp_mem))
> > +    }
> > +
> > +    // Allocates the various regions for the command and reduces the payload size
> > +    // to match what is needed for the command.
> 
> Let's also explain what the elements in the returned tuple are.
> 
> > +    //
> > +    // # Errors
> > +    //
> > +    // Returns `Err(EAGAIN)` if the driver area is too small to hold the
> > +    // requested command.
> > +    fn allocate_command_regions<'a, M: CommandToGsp>(
> > +        &'a mut self,
> > +        payload_size: usize,
> > +    ) -> Result<(&'a mut GspMsgElement, &'a mut M, &'a mut [u8], &'a mut [u8])> {
> > +        let driver_area = self.driver_write_area();
> > +        let msg_size = size_of::<GspMsgElement>() + size_of::<M>() + payload_size;
> > +        let driver_area_size = (driver_area.0.len() + driver_area.1.len()) << GSP_PAGE_SHIFT;
> > +
> > +        if msg_size > driver_area_size {
> > +            return Err(EAGAIN);
> > +        }
> > +
> > +        #[allow(clippy::incompatible_msrv)]
> > +        let (msg_header_slice, slice_1) = driver_area
> > +            .0
> > +            .as_flattened_mut()
> 
> We are going to have a slight problem here. `as_flattened_mut` is only
> available since Rust 1.80... and we need to support 1.78.
> 
> I suppose we could work this around by defining a custom function tht
> just calls `as_flattened_mut` in Rust versions that support it and
> provides an alternative implementation for those that don't, but I
> wonder whether there is a better way.

Oh I didn't realise the function name had changed, I thought we just needed to
allow the experimental feature in older versions. I probably wont get to this
this week so have left it as a TODO for now.

> > +            .split_at_mut(size_of::<GspMsgElement>());
> > +        let msg_header = GspMsgElement::from_bytes_mut(msg_header_slice).ok_or(EINVAL)?;
> > +        let (cmd_slice, payload_1) = slice_1.split_at_mut(size_of::<M>());
> > +        let cmd = M::from_bytes_mut(cmd_slice).ok_or(EINVAL)?;
> > +
> > +        #[allow(clippy::incompatible_msrv)]
> > +        let payload_2 = driver_area.1.as_flattened_mut();
> > +
> > +        let (payload_1, payload_2) = if payload_1.len() > payload_size {
> > +            // Payload fits entirely in payload_1
> > +            (&mut payload_1[..payload_size], &mut payload_2[0..0])
> > +        } else {
> > +            // Need all of payload_1 and some of payload_2
> > +            let payload_2_len = payload_size - payload_1.len();
> > +            (payload_1, &mut payload_2[..payload_2_len])
> > +        };
> > +
> > +        Ok((msg_header, cmd, payload_1, payload_2))
> > +    }
> 
> We should probably add a few comments in this function to explain the
> steps, otherwise it can be a bit difficult to parse.

Ok.

> > +    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
> > +        let sum64 = it
> > +            .enumerate()
> > +            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
> > +            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
> > +
> > +        ((sum64 >> 32) as u32) ^ (sum64 as u32)
> > +    }
> > +
> > +    // Notify GSP that we have updated the command queue pointers.
> > +    fn notify_gsp(bar: &Bar0) {
> > +        NV_PGSP_QUEUE_HEAD::default().set_address(0).write(bar);
> > +    }
> 
> Can we move this method to `regs.rs`, as an impl block of
> `NV_PGSP_QUEUE_HEAD`? It makes more sense to have it there imho.

We could, but I think it belongs here. If I'm understanding things correctly the
value of this register is FW/SW defined. For now GSP doesn't look at the value
so it doesn't really matter what we write, but it's possible that could change
which means we want it here in the code that talks to GSP.

> > +
> > +    #[expect(unused)]
> > +    pub(crate) fn send_gsp_command<M, E>(&mut self, bar: &Bar0, init: impl Init<M, E>) -> Result
> > +    where
> > +        M: CommandToGsp,
> > +        // This allows all error types, including `Infallible`, to be used with `init`. Without
> > +        // this we cannot use regular stack objects as `init` since their `Init` implementation
> > +        // does not return any error.
> > +        Error: From<E>,
> > +    {
> > +        #[repr(C)]
> > +        struct FullCommand<M> {
> > +            hdr: GspMsgElement,
> > +            cmd: M,
> > +        }
> > +        let (msg_header, cmd, _, _) = self.gsp_mem.allocate_command_regions::<M>(0)?;
> > +
> > +        let seq = self.seq;
> > +        let initializer = try_init!(FullCommand {
> > +            hdr <- GspMsgElement::init(seq, size_of::<M>(), M::FUNCTION),
> > +            cmd <- init,
> > +        });
> > +
> > +        // Fill the header and command in-place.
> > +        // SAFETY:
> > +        //  - allocate_command_regions guarantees msg_header points to enough
> > +        //    space in the command queue to contain FullCommand
> > +        //  - __init ensures the command header and struct a fully initialized
> > +        unsafe {
> > +            initializer.__init(msg_header.as_bytes_mut().as_mut_ptr().cast())?;
> > +        }
> > +
> > +        msg_header.set_checksum(Cmdq::calculate_checksum(SBufferIter::new_reader([
> > +            msg_header.as_bytes(),
> > +            cmd.as_bytes(),
> > +        ])));
> > +
> > +        dev_info!(
> > +            &self.dev,
> > +            "GSP RPC: send: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
> > +            self.seq,
> > +            msg_header.function_number(),
> > +            msg_header.function()?,
> > +            msg_header.length(),
> > +        );
> > +
> > +        let elem_count = msg_header.element_count();
> > +        self.seq += 1;
> > +        self.gsp_mem.advance_cpu_write_ptr(elem_count);
> > +        Cmdq::notify_gsp(bar);
> > +
> > +        Ok(())
> > +    }
> > +
> > +    #[expect(unused)]
> > +    pub(crate) fn send_gsp_command_with_payload<M, E>(
> > +        &mut self,
> > +        bar: &Bar0,
> > +        payload_size: usize,
> > +        init: impl Init<M, E>,
> > +        init_payload: impl FnOnce(SBufferIter<core::array::IntoIter<&mut [u8], 2>>) -> Result,
> > +    ) -> Result
> > +    where
> > +        M: CommandToGspWithPayload,
> > +        // This allows all error types, including `Infallible`, to be used with `init`. Without
> > +        // this we cannot use regular stack objects as `init` since their `Init` implementation
> > +        // does not return any error.
> > +        Error: From<E>,
> > +    {
> > +        #[repr(C)]
> > +        struct FullCommand<M> {
> > +            hdr: GspMsgElement,
> > +            cmd: M,
> > +        }
> > +        let (msg_header, cmd, payload_1, payload_2) =
> > +            self.gsp_mem.allocate_command_regions::<M>(payload_size)?;
> > +
> > +        let seq = self.seq;
> > +        let initializer = try_init!(FullCommand {
> > +            hdr <- GspMsgElement::init(seq, size_of::<M>() + payload_size, M::FUNCTION),
> > +            cmd <- init,
> > +        });
> > +
> > +        // Fill the header and command in-place.
> > +        // SAFETY:
> > +        //  - allocate_command_regions guarantees msg_header points to enough
> > +        //    space in the command queue to contain FullCommand
> > +        //  - __init ensures the command header and struct a fully initialized
> > +        unsafe {
> > +            initializer.__init(msg_header.as_bytes_mut().as_mut_ptr().cast())?;
> > +        }
> > +
> > +        // Fill the payload
> > +        let sbuffer = SBufferIter::new_writer([&mut payload_1[..], &mut payload_2[..]]);
> > +        init_payload(sbuffer)?;
> > +
> > +        msg_header.set_checksum(Cmdq::calculate_checksum(SBufferIter::new_reader([
> > +            msg_header.as_bytes(),
> > +            cmd.as_bytes(),
> > +            payload_1,
> > +            payload_2,
> > +        ])));
> > +
> > +        dev_info!(
> > +            &self.dev,
> > +            "GSP RPC: send: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
> > +            self.seq,
> > +            msg_header.function_number(),
> > +            msg_header.function()?,
> > +            msg_header.length(),
> > +        );
> > +
> > +        let elem_count = msg_header.element_count();
> > +        self.seq += 1;
> > +        self.gsp_mem.advance_cpu_write_ptr(elem_count);
> > +        Cmdq::notify_gsp(bar);
> > +
> > +        Ok(())
> > +    }
> 
> I think we can factorize most of `send_gsp_command` and
> `send_gsp_command_with_payload` into a private method that requires a
> `CommandToGspBase` and takes the same parameters as
> `send_gsp_command_with_payload`. All it would need to do when called
> from `send_gsp_command` is take a closure that doesn't write any
> payload.

You're right, that is nicer. I was thinking it would be good to have a simple
function that ignored the payloads, but it didn't really end up any simpler.

> > +
> > +    #[expect(unused)]
> > +    pub(crate) fn receive_msg_from_gsp<M: MessageFromGsp, R>(
> > +        &mut self,
> > +        timeout: Delta,
> > +        init: impl FnOnce(&M, SBufferIter<core::array::IntoIter<&[u8], 2>>) -> Result<R>,
> > +    ) -> Result<R> {
> > +        let driver_area = read_poll_timeout(
> > +            || Ok(self.gsp_mem.driver_read_area()),
> > +            |driver_area: &(&[[u8; 4096]], &[[u8; 4096]])| !driver_area.0.is_empty(),
> > +            Delta::from_millis(10),
> > +            timeout,
> > +        )?;
> > +
> > +        #[allow(clippy::incompatible_msrv)]
> > +        let (msg_header_slice, slice_1) = driver_area
> > +            .0
> > +            .as_flattened()
> > +            .split_at(size_of::<GspMsgElement>());
> > +        let msg_header = GspMsgElement::from_bytes(msg_header_slice).ok_or(EIO)?;
> > +        if msg_header.length() < size_of::<M>() as u32 {
> > +            return Err(EIO);
> > +        }
> > +
> > +        let function: MsgFunction = msg_header.function().map_err(|_| {
> > +            dev_info!(
> > +                self.dev,
> > +                "GSP RPC: receive: seq# {}, bad function=0x{:x}, length=0x{:x}\n",
> > +                msg_header.sequence(),
> > +                msg_header.function_number(),
> > +                msg_header.length(),
> > +            );
> > +            EIO
> > +        })?;
> > +
> > +        // Log RPC receive with message type decoding
> > +        dev_info!(
> > +            self.dev,
> > +            "GSP RPC: receive: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
> > +            msg_header.sequence(),
> > +            msg_header.function_number(),
> > +            function,
> > +            msg_header.length(),
> > +        );
> > +
> > +        let (cmd_slice, payload_1) = slice_1.split_at(size_of::<M>());
> > +        #[allow(clippy::incompatible_msrv)]
> > +        let payload_2 = driver_area.1.as_flattened();
> > +
> > +        // Cut the payload slice(s) down to the actual length of the payload.
> > +        let (cmd_payload_1, cmd_payload_2) =
> > +            if payload_1.len() > msg_header.length() as usize - size_of::<M>() {
> > +                (
> > +                    payload_1
> > +                        .split_at(msg_header.length() as usize - size_of::<M>())
> > +                        .0,
> > +                    &payload_2[0..0],
> > +                )
> > +            } else {
> > +                (
> > +                    payload_1,
> > +                    payload_2
> > +                        .split_at(msg_header.length() as usize - size_of::<M>() - payload_1.len())
> > +                        .0,
> > +                )
> > +            };
> > +
> > +        if Cmdq::calculate_checksum(SBufferIter::new_reader([
> > +            msg_header.as_bytes(),
> > +            cmd_slice,
> > +            cmd_payload_1,
> > +            cmd_payload_2,
> > +        ])) != 0
> > +        {
> > +            dev_err!(
> > +                self.dev,
> > +                "GSP RPC: receive: Call {} - bad checksum",
> > +                msg_header.sequence()
> > +            );
> > +            return Err(EIO);
> > +        }
> > +
> > +        let result = if function == M::FUNCTION {
> > +            let cmd = M::from_bytes(cmd_slice).ok_or(EINVAL)?;
> > +            let sbuffer = SBufferIter::new_reader([cmd_payload_1, cmd_payload_2]);
> > +            init(cmd, sbuffer)
> > +        } else {
> > +            Err(ERANGE)
> > +        };
> > +
> > +        self.gsp_mem
> > +            .advance_cpu_read_ptr(msg_header.length().div_ceil(GSP_PAGE_SIZE as u32));
> > +        result
> > +    }
> 
> Here we probably also want to comment the different steps a bit.

Ok.