[<prev] [next>] [day] [month] [year] [list]
Message-ID: <fbe34de16f5c0bf25a16f9819a57fdd81e5bb08c.camel@linux.ibm.com>
Date: Fri, 24 Oct 2025 19:18:36 +0200
From: Gerd Bayer <gbayer@...ux.ibm.com>
To: Bjorn Helgaas <bhelgaas@...gle.com>, Jay Cornwall
<Jay.Cornwall@....com>,
Felix Kuehling <Felix.Kuehling@....com>
Cc: Niklas Schnelle <schnelle@...ux.ibm.com>,
Alexander Schmidt
<alexs@...ux.ibm.com>,
netdev <netdev@...r.kernel.org>,
linux-rdma
<linux-rdma@...r.kernel.org>,
linux-pci <linux-pci@...r.kernel.org>
Subject: Q: Usage of pci_enable_atomic_ops_to_root()
Hi all,
I stumbled over mlx5's usage of pci_enable_atomic_ops_to_root() at
https://elixir.bootlin.com/linux/v6.18-rc2/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L937
and was wondering if its repeated calls with the 3 available sizes gave
it the intended result.
I assume the intent was to enable requesting AtomicOps only if all
three sizes 32/64/128-bit were supported at the root-complex. However,
pci_enable_atomic_ops_to_root() would enable the request at the PCIe
level, even if just 32-bit sized Ops was supported at the root-complex.
So I checked other users in the kernel and found an inconclusive
picture:
The AMD GPU that this was originally introduced for [0] checks for a
combination of two sizes, while a few infiniband/ethernet and the vfio-
pci driver do variations of sequential checks (potentially enabling
requests that they don't want to)
Now the PCIe Spec Rev. 7.0 has also a mixed bag. Section 6.15.3.1
mandates for Root Ports:
> If a Root Port implements any AtomicOp Completer capability for host
> memory access, it must implement all 32-bit and 64-bit AtomicOp
> Completer capabilities. Implementing 128-bit CAS Completer capability
> is optional.
While this is specific, marking the CAS Op Completions in the 128-bit
variant optional, the Capability bits just specify 128-bit AtomicOps
(all AtomicOps: FetchAdd, Swap, CAS). Strictly interpreted, this would
require root port implementors to announce all-or-nothing of 32/64/128-
bit AtomicOps - which kind of makes the size-granularity of the
capability bits useless - and leave the endpoint device (and its
driver) attempting to use 128-bit CAS in the dark...
[0]: https://lore.kernel.org/linux-pci/1515113100-4718-1-git-send-email-Felix.Kuehling@amd.com/
Can anybody shed some light on this?
Thank you,
Gerd
Powered by blists - more mailing lists