[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250213180134.323929-1-tariqt@nvidia.com>
Date: Thu, 13 Feb 2025 20:01:24 +0200
From: Tariq Toukan <tariqt@...dia.com>
To: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>, "Andrew
Lunn" <andrew+netdev@...n.ch>, Jiri Pirko <jiri@...dia.com>
CC: Cosmin Ratiu <cratiu@...dia.com>, Carolina Jubran <cjubran@...dia.com>,
Gal Pressman <gal@...dia.com>, Mark Bloch <mbloch@...dia.com>, Donald Hunter
<donald.hunter@...il.com>, Jiri Pirko <jiri@...nulli.us>, Jonathan Corbet
<corbet@....net>, Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky
<leon@...nel.org>, Tariq Toukan <tariqt@...dia.com>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <linux-rdma@...r.kernel.org>
Subject: [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains
Hi,
This series introduces rate domains in devlink and mlx5
driver. Find detailed description by Cosmin below.
Regards,
Tariq
devlink objects support rate management for tx scheduling, which
involves maintaining a tree of rate nodes that corresponds to tx
schedulers in hardware. 'man devlink-rate' has the full details.
The tree of rate nodes is maintained per devlink object, protected by
the devlink lock.
There exists hardware capable of instantiating a tx scheduling tree
which spans multiple functions of the same physical device (and thus
devlink objects) and therefore the current API and locking scheme is
insufficient.
This patch series changes the devlink rate implementation and API to
allow supporting such hardware and managing tx scheduling trees across
multiple functions of a physical device.
Modeling this requires having devlink rate nodes with parents in other
devlink objects. A naive approach that relies on the current
one-lock-per-devlink model is impossible, as it would require in some
cases acquiring multiple devlink locks in the correct order.
The solution proposed is to move rates in a separate object named 'rate
domain'. Devlink objects create a private rate domain on init and
hardware that supports cross-function tx scheduling can switch to using
a shared rate domain for a set of devlink objects. Shared rate domains
have an additional lock serializing access to rate notes.
A new pair of devlink attributes is introduced for specifying a foreign
parent device as well as changes to the rate management devlink calls to
allow setting a rate node parent to the requested foreign parent device.
Finally, this API is used from mlx5 for NICs with the correct capability
bit to allow cross-function tx scheduling.
A note about net-shapers:
The net-shapers framework is completely orthogonal to this patch series.
net-shapers does shaping for tx queues, groups of queues and up to the
netdevice level. This patch series is for shaping across functions, so
it is strictly above the netdevice level in the shaping hierarchy.
This patch series was previously sent as an RFC ([1]).
Patches:
Small cleanup:
devlink: Remove unused param of devlink_rate_nodes_check
Introduce private rate domains:
devlink: Store devlink rates in a rate domain
Introduce rate domain locking (noop now as rate domains are private):
devlink: Serialize access to rate domains
Introduce shared rate domains and a global registry for them:
devlink: Introduce shared rate domains
Extend the devlink rate API with foreign parent devices:
devlink: Allow specifying parent device for rate commands
devlink: Allow rate node parents from other devlinks
Extends mlx5 implementation with the ability to share qos domains:
net/mlx5: qos: Introduce shared esw qos domains
Use the newly introduced stuff to support cross-function tx scheduling:
net/mlx5: qos: Support cross-esw tx scheduling
net/mlx5: qos: Init shared devlink rate domain
Finally, update documentation:
net/mlx5: Document devlink rates and cross-esw scheduling
[1] https://lore.kernel.org/netdev/20241113203317.2507537-1-cratiu@nvidia.com/
Cosmin Ratiu (10):
devlink: Remove unused param of devlink_rate_nodes_check
devlink: Store devlink rates in a rate domain
devlink: Serialize access to rate domains
devlink: Introduce shared rate domains
devlink: Allow specifying parent device for rate commands
devlink: Allow rate node parents from other devlinks
net/mlx5: qos: Introduce shared esw qos domains
net/mlx5: qos: Support cross-esw tx scheduling
net/mlx5: qos: Init shared devlink rate domain
net/mlx5: Document devlink rates and cross-esw scheduling
Documentation/netlink/specs/devlink.yaml | 18 +-
.../networking/devlink/devlink-port.rst | 2 +
Documentation/networking/devlink/mlx5.rst | 33 +++
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 144 ++++++++++--
include/net/devlink.h | 8 +
include/uapi/linux/devlink.h | 3 +
net/devlink/core.c | 86 ++++++-
net/devlink/dev.c | 6 +-
net/devlink/devl_internal.h | 34 ++-
net/devlink/netlink.c | 74 ++++--
net/devlink/netlink_gen.c | 20 +-
net/devlink/netlink_gen.h | 7 +
net/devlink/rate.c | 217 +++++++++++++-----
13 files changed, 548 insertions(+), 104 deletions(-)
base-commit: 8dbf0c7556454b52af91bae305ca71500c31495c
--
2.45.0
Powered by blists - more mailing lists