linux-kernel - Re: [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c62474ef-ace4-4aa0-8dec-53cc52b7344c@nvidia.com>
Date: Mon, 20 Oct 2025 14:49:22 -0700
From: John Hubbard <jhubbard@...dia.com>
To: Joel Fernandes <joelagnelf@...dia.com>, linux-kernel@...r.kernel.org,
 rust-for-linux@...r.kernel.org, dri-devel@...ts.freedesktop.org,
 dakr@...nel.org, acourbot@...dia.com
Cc: Alistair Popple <apopple@...dia.com>, Miguel Ojeda <ojeda@...nel.org>,
 Alex Gaynor <alex.gaynor@...il.com>, Boqun Feng <boqun.feng@...il.com>,
 Gary Guo <gary@...yguo.net>, bjorn3_gh@...tonmail.com,
 Benno Lossin <lossin@...nel.org>, Andreas Hindborg <a.hindborg@...nel.org>,
 Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 Timur Tabi <ttabi@...dia.com>, joel@...lfernandes.org,
 Elle Rhumsaa <elle@...thered-steel.dev>,
 Daniel Almeida <daniel.almeida@...labora.com>, nouveau@...ts.freedesktop.org
Subject: Re: [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue
 architecture

On 10/20/25 11:55 AM, Joel Fernandes wrote:
> Document the GSP RPC message queue architecture in detail.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
> ---

Hi Joel,


>  Documentation/gpu/nova/core/msgq.rst | 159 +++++++++++++++++++++++++++


Can we please change the file name to approximately something like
message_queue.rst? I'll buy you a few extra characters. :)


>  Documentation/gpu/nova/index.rst     |   1 +
>  2 files changed, 160 insertions(+)
>  create mode 100644 Documentation/gpu/nova/core/msgq.rst
> 
> diff --git a/Documentation/gpu/nova/core/msgq.rst b/Documentation/gpu/nova/core/msgq.rst
> new file mode 100644
> index 000000000000..84e25be69cd6
> --- /dev/null
> +++ b/Documentation/gpu/nova/core/msgq.rst
> @@ -0,0 +1,159 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=========================================
> +Nova GPU RPC Message Passing Architecture
> +=========================================
> +
> +.. note::
> +   The following description is approximate and current as of the Ampere family.
> +   It may change for future generations and is intended to assist in understanding
> +   the driver code.
> +
> +Overview
> +========
> +
> +The Nova GPU driver communicates with the GSP (GPU System Processor) firmware
> +using an RPC (Remote Procedure Call) mechanism built on top of circular message
> +queues in shared memory. This document describes the structure of RPC messages
> +and the mechanics of the message passing system.
> +
> +Message Queue Architecture
> +==========================
> +
> +The communication between CPU and GSP uses two unidirectional circular queues:
> +
> +1. **CPU Queue (cpuq)**: CPU writes, GSP reads
> +2. **GSP Queue (gspq)**: GSP writes, CPU reads
> +
> +The advantage of this approach is no synchronization is required to access the
> +queues, if one entity wants to communicate with the other (CPU or GSP), they
> +simply write into their own queue.

How about this:

The advantage of this approach is that no synchronization is required to access the
queues. If one entity wants to communicate with the other (CPU or GSP), they
simply write into their own queue.


> +
> +Memory Layout
> +-------------
> +
> +The shared memory region (GspMem) where the queues reside has the following
> +layout::
> +
> +    +------------------------+ GspMem DMA Handle (base address)
> +    |    PTE Array (4KB)     |  <- Self-mapping page table
> +    | PTE[0] = base + 0x0000 |     Points to this page
> +    | PTE[1] = base + 0x1000 |     Points to CPU queue Header page

s/Header/header/

> +    | PTE[2] = base + 0x2000 |     Points to first page of CPU queue data
> +    | ...                    |     ...
> +    | ...                    |     ...
> +    +------------------------+ base + 0x1000
> +    |    CPU Queue Header    |  MsgqTxHeader + MsgqRxHeader
> +    |    - TX Header (32B)   |
> +    |    - RX Header (4B)    | (1 page)
> +    |    - Padding           |
> +    +------------------------+ base + 0x2000
> +    |    CPU Queue Data      | (63 pages)
> +    |    (63 x 4KB pages)    |  Circular buffer for messages
> +    | ...                    |     ...
> +    +------------------------+ base + 0x41000
> +    |    GSP Queue Header    |  MsgqTxHeader + MsgqRxHeader
> +    |    - TX Header (32B)   |
> +    |    - RX Header (4B)    | (1 page)
> +    |    - Padding           |
> +    +------------------------+ base + 0x42000
> +    |    GSP Queue Data      | (63 pages)
> +    |    (63 x 4KB pages)    |  Circular buffer for messages
> +    | ...                    |     ...
> +    +------------------------+ base + 0x81000
> +
> +
> +Message Passing Mechanics
> +-------------------------
> +The split read/write pointer design allows bidirectional communication between the
> +CPU and GSP without synchronization (if it were a shared queue), for example, the
> +following diagram illustrates pointer updates, when CPU sends message to GSP::
> +
> +    +--------------------------------------------------------------------------+
> +    |                     DMA coherent Shared Memory (GspMem)                  |

I think it would help to do this:

s/DMA coherent/DMA-coherent/

> +    +--------------------------------------------------------------------------+
> +    |                          (CPU sending message to GSP)                    |
> +    |  +-------------------+                      +-------------------+        |
> +    |  |   GSP Queue       |                      |   CPU Queue       |        |
> +    |  |                   |                      |                   |        |
> +    |  | +-------------+   |                      | +-------------+   |        |
> +    |  | |  TX Header  |   |                      | |  TX Header  |   |        |
> +    |  | | write_ptr   |   |                      | | write_ptr   |---+----,   |
> +    |  | |             |   |                      | |             |   |    |   |
> +    |  | +-------------+   |                      | +-------------+   |    |   |
> +    |  |                   |                      |                   |    |   |
> +    |  | +-------------+   |                      | +-------------+   |    |   |
> +    |  | |  RX Header  |   |                      | |  RX Header  |   |    |   |
> +    |  | |  read_ptr ------+-------,              | |  read_ptr   |   |    |   |
> +    |  | |             |   |       |              | |             |   |    |   |
> +    |  | +-------------+   |       |              | +-------------+   |    |   |
> +    |  |                   |       |              |                   |    |   |
> +    |  | +-------------+   |       |              | +-------------+   |    |   |
> +    |  | |   Page 0    |   |       |              | |   Page 0    |   |    |   |
> +    |  | +-------------+   |       |              | +-------------+   |    |   |
> +    |  | |   Page 1    |   |       `--------------> |   Page 1    |   |    |   |
> +    |  | +-------------+   |                      | +-------------+   |    |   |
> +    |  | |   Page 2    |   |                      | |   Page 2    |<--+----'   |
> +    |  | +-------------+   |                      | +-------------+   |        |
> +    |  | |     ...     |   |                      | |     ...     |   |        |
> +    |  | +-------------+   |                      | +-------------+   |        |
> +    |  | |   Page 62   |   |                      | |   Page 62   |   |        |
> +    |  | +-------------+   |                      | +-------------+   |        |
> +    |  |   (63 pages)      |                      |   (63 pages)      |        |
> +    |  +-------------------+                      +-------------------+        |
> +    |                                                                          |
> +    +--------------------------------------------------------------------------+
> +
> +When the CPU sends a message to the GSP, it writes the message to its own
> +queue (CPU queue) and updates the write pointer in its queue's TX header. The GSP
> +then reads the read pointer in its own queue's RX header and knows that there are
> +pending messages from the CPU because its RX header's read pointer is behind the
> +CPU's TX header's write pointer. After reading the message, the GSP updates its RX
> +header's read pointer to catch up. The same happens in reverse.

What do you think of this alternative wording:

When the CPU sends a message to the GSP, it writes the message to its own queue
(CPU queue) and updates the write pointer in its queue's TX header. The GSP
checks for pending messages by reading its RX header's read pointer and
comparing it to the CPU's TX header's write pointer. If the GSP's read pointer
lags behind, messages are waiting. After processing each message, the GSP
advances its read pointer to acknowledge receipt. 

For GSP-to-CPU communication, the roles reverse: the GSP writes to its queue and
updates its TX write pointer, while the CPU monitors its RX read pointer and
advances it after consuming messages.


> +
> +Page-based message passing
> +--------------------------
> +The message queue is page-based, which means that the message is stored in a
> +page-aligned buffer. The page size is 4KB. Each message starts at the beginning of
> +a page. If the message is shorter than a page, the remaining space in the page is
> +wasted. The next message starts at the beginning of the next page no matter how
> +small the previous message was.
> +

Error Handling: The document doesn't mention:

a) What happens when queues are full
b) How message corruption is detected and handled
c) Recovery mechanisms for communication failures

Performance Considerations: It would be helpful to add:
a) Why 63 pages were chosen for each queue
b) Typical message sizes and throughput expectations

> +Note that messages larger than a page will span multiple pages. This means that
> +it is possible that the first part of the message lands on the last page, and the
> +second part of the message lands on the first page, thus requiring out-of-order
> +memory access. The SBuffer data structure in Nova tackles this use case.

I don't think SBuffer has landed in the kernel, nor in the pre-requisite bitfield
patchset, yet, right? We could replace that last sentence with something like
"TODO: show how the upcoming SBuffer data structure helps with this use case".


> +
> +RPC Message Structure:

Let's remove the trailing colon.

> +======================
> +
> +An RPC message is also called a "Message Element". The entire message has
> +multiple headers. There is a "message element" header which handles message
> +queue specific details and integrity, followed by a "RPC" header which handles

s/a "RPC"/an "RPC"/

> +the RPC protocol details::
> +
> +    +----------------------------------+
> +    |        GspMsgHeader (64B)        | (aka, Message Element Header)
> +    +----------------------------------+
> +    | auth_tag_buffer[16]              | --+
> +    | aad_buffer[16]                   |   |
> +    | checksum        (u32)            |   +-- Security & Integrity

Can we say anything useful here about:

a) What authentication mechanism is used
b) How message integrity is verified
c) Whether encryption is employed

?

> +    | sequence        (u32)            |   |
> +    | elem_count      (u32)            |   |
> +    | pad             (u32)            | --+
> +    +----------------------------------+
> +    |        GspRpcHeader (32B)        |
> +    +----------------------------------+
> +    | header_version  (0x03000000)     | --+
> +    | signature       (0x43505256)     |   |
> +    | length          (u32)            |   +-- RPC Protocol
> +    | function        (u32)            |   |
> +    | rpc_result      (u32)            |   |
> +    | rpc_result_private (u32)         |   |
> +    | sequence        (u32)            |   |
> +    | cpu_rm_gfid     (u32)            | --+

This shows field values but doesn't explain:

a) What "signature (0x43505256)" represents (appears to be "CPRV" in ASCII)
b) The purpose of cpu_rm_gfid field
c) Valid ranges for the function field

> +    +----------------------------------+
> +    |                                  |
> +    |        Payload (Variable)        | --- Function-specific data
> +    |                                  |
> +    +----------------------------------+
> diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
> index e39cb3163581..46302daace34 100644
> --- a/Documentation/gpu/nova/index.rst
> +++ b/Documentation/gpu/nova/index.rst
> @@ -32,3 +32,4 @@ vGPU manager VFIO driver and the nova-drm driver.
>     core/devinit
>     core/fwsec
>     core/falcon
> +   core/msgq

thanks,
-- 
John Hubbard