[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240814085320.134075-2-cratiu@nvidia.com>
Date: Wed, 14 Aug 2024 11:51:46 +0300
From: Cosmin Ratiu <cratiu@...dia.com>
To: <netdev@...r.kernel.org>
CC: <cjubran@...dia.com>, <cratiu@...dia.com>, <tariqt@...dia.com>,
<saeedm@...dia.com>, <yossiku@...dia.com>, <gal@...dia.com>,
<jiri@...dia.com>
Subject: [PATCH] devlink: Extend the devlink rate API to support rate management on traffic classes
From: Carolina Jubran <cjubran@...dia.com>
Introduce support for traffic classes (TC) in the devlink-rate API,
expanding beyond the current two object types: nodes and leafs. The
support of traffic classes provides more granular control, especially
in scenarios where customers need to implement Enhanced Transmission
Selection (ETS) for specific groups of Virtual Functions (VFs).
For instance, users can now allocate specific traffic classes, such as
TC0 and TC5, to handle TCP/UDP and RoCE traffic respectively, with
defined bandwidth shares (e.g., 20% for TC0 and 80% for TC5).
Example:
DEV=pci/0000:08:00.0
devlink port function rate add $DEV/vfs_group tx_share 10Gbit tx_max 50Gbit
devlink port function rate add $DEV/group_tc0 tx_weight 20 parent vfs_group
devlink port function rate add $DEV/group_tc5 tx_weight 80 parent vfs_group
devlink port function rate set $DEV/1/tc0 parent group_tc0
devlink port function rate set $DEV/2/tc0 parent group_tc0
devlink port function rate set $DEV/1/tc5 parent group_tc5
devlink port function rate set $DEV/2/tc5 parent group_tc5
Signed-off-by: Carolina Jubran <cjubran@...dia.com>
Change-Id: If14e37966db416e1ff715c19439d071814800efe
---
Documentation/netlink/specs/devlink.yaml | 8 ++
.../networking/devlink/devlink-port.rst | 18 ++-
include/net/devlink.h | 16 +++
include/uapi/linux/devlink.h | 2 +
net/devlink/netlink_gen.c | 3 +
net/devlink/rate.c | 128 ++++++++++++++++++
6 files changed, 170 insertions(+), 5 deletions(-)
diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
index 09fbb4c03fc8..14e702c17387 100644
--- a/Documentation/netlink/specs/devlink.yaml
+++ b/Documentation/netlink/specs/devlink.yaml
@@ -83,6 +83,8 @@ definitions:
name: leaf
-
name: node
+ -
+ name: traffic-class
-
type: enum
name: sb-threshold-type
@@ -781,6 +783,9 @@ attribute-sets:
-
name: rate-tx-max
type: u64
+ -
+ name: rate-traffic-class-index
+ type: u16
-
name: rate-node-name
type: string
@@ -2121,6 +2126,7 @@ operations:
- bus-name
- dev-name
- port-index
+ - rate-traffic-class-index
- rate-node-name
reply: &rate-get-reply
value: 76
@@ -2143,6 +2149,7 @@ operations:
attributes:
- bus-name
- dev-name
+ - rate-traffic-class-index
- rate-node-name
- rate-tx-share
- rate-tx-max
@@ -2163,6 +2170,7 @@ operations:
attributes:
- bus-name
- dev-name
+ - rate-traffic-class-index
- rate-node-name
- rate-tx-share
- rate-tx-max
diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index 9d22d41a7cd1..6ea50b7cf769 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -374,14 +374,21 @@ At this point a matching subfunction driver binds to the subfunction's auxiliary
Rate object management
======================
-Devlink provides API to manage tx rates of single devlink port or a group.
-This is done through rate objects, which can be one of the two types:
+Devlink provides API to manage tx rates of single devlink port, specific traffic classes or a group.
+This is done through rate objects, which can be one of the three types:
``leaf``
Represents a single devlink port; created/destroyed by the driver. Since leaf
have 1to1 mapping to its devlink port, in user space it is referred as
``pci/<bus_addr>/<port_index>``;
+``traffic class (tc)``
+ Represents a traffic class on a devlink port; created/destroyed by the
+ driver. The traffic class object is referred to in userspace as
+ ``pci/<bus_addr>/<port_index>/tc<traffic_class_index>``. This object allows
+ for the management of TX rates at the traffic class level on a specific
+ devlink port.
+
``node``
Represents a group of rate objects (leafs and/or nodes); created/deleted by
request from the userspace; initially empty (no rate objects added). In
@@ -437,9 +444,10 @@ Arbitration flow from the high level:
#. If all the nodes from the highest priority sub-group are satisfied, or
overused their assigned BW, move to the lower priority nodes.
-Driver implementations are allowed to support both or either rate object types
-and setting methods of their parameters. Additionally driver implementation
-may export nodes/leafs and their child-parent relationships.
+Driver implementations are allowed to support any combination of the rate
+object types and setting methods of their parameters. Additionally driver
+implementation may export nodes, leafs, traffic classes, and their
+child-parent relationships.
Terms and Definitions
=====================
diff --git a/include/net/devlink.h b/include/net/devlink.h
index db5eff6cb60f..a485c489acd6 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -117,6 +117,7 @@ struct devlink_rate {
u32 tx_priority;
u32 tx_weight;
+ u16 tc_id;
};
struct devlink_port {
@@ -1477,6 +1478,14 @@ struct devlink_ops {
u32 tx_priority, struct netlink_ext_ack *extack);
int (*rate_node_tx_weight_set)(struct devlink_rate *devlink_rate, void *priv,
u32 tx_weight, struct netlink_ext_ack *extack);
+ int (*rate_traffic_class_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
+ u64 tx_share, struct netlink_ext_ack *extack);
+ int (*rate_traffic_class_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
+ u64 tx_max, struct netlink_ext_ack *extack);
+ int (*rate_traffic_class_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv,
+ u32 tx_priority, struct netlink_ext_ack *extack);
+ int (*rate_traffic_class_tx_weight_set)(struct devlink_rate *devlink_rate, void *priv,
+ u32 tx_weight, struct netlink_ext_ack *extack);
int (*rate_node_new)(struct devlink_rate *rate_node, void **priv,
struct netlink_ext_ack *extack);
int (*rate_node_del)(struct devlink_rate *rate_node, void *priv,
@@ -1489,6 +1498,10 @@ struct devlink_ops {
struct devlink_rate *parent,
void *priv_child, void *priv_parent,
struct netlink_ext_ack *extack);
+ int (*rate_traffic_class_parent_set)(struct devlink_rate *child,
+ struct devlink_rate *parent,
+ void *priv_child, void *priv_parent,
+ struct netlink_ext_ack *extack);
/**
* selftests_check() - queries if selftest is supported
* @devlink: devlink instance
@@ -1723,6 +1736,9 @@ devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
int
devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv,
struct devlink_rate *parent);
+int
+devl_rate_traffic_class_create(struct devlink_port *devlink_port, void *priv, u16 tc_id,
+ struct devlink_rate *parent);
void devl_rate_leaf_destroy(struct devlink_port *devlink_port);
void devl_rate_nodes_destroy(struct devlink *devlink);
void devlink_port_linecard_set(struct devlink_port *devlink_port,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 9401aa343673..94f6e3ca5f8d 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -224,6 +224,7 @@ enum devlink_port_flavour {
enum devlink_rate_type {
DEVLINK_RATE_TYPE_LEAF,
DEVLINK_RATE_TYPE_NODE,
+ DEVLINK_RATE_TYPE_TRAFFIC_CLASS,
};
enum devlink_param_cmode {
@@ -595,6 +596,7 @@ enum devlink_attr {
DEVLINK_ATTR_RATE_TYPE, /* u16 */
DEVLINK_ATTR_RATE_TX_SHARE, /* u64 */
DEVLINK_ATTR_RATE_TX_MAX, /* u64 */
+ DEVLINK_ATTR_RATE_TRAFFIC_CLASS_INDEX, /* u16 */
DEVLINK_ATTR_RATE_NODE_NAME, /* string */
DEVLINK_ATTR_RATE_PARENT_NODE_NAME, /* string */
diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
index f9786d51f68f..d62772a02930 100644
--- a/net/devlink/netlink_gen.c
+++ b/net/devlink/netlink_gen.c
@@ -486,6 +486,7 @@ static const struct nla_policy devlink_rate_get_do_nl_policy[DEVLINK_ATTR_RATE_N
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32, },
+ [DEVLINK_ATTR_RATE_TRAFFIC_CLASS_INDEX] = { .type = NLA_U16, },
[DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING, },
};
@@ -499,6 +500,7 @@ static const struct nla_policy devlink_rate_get_dump_nl_policy[DEVLINK_ATTR_DEV_
static const struct nla_policy devlink_rate_set_nl_policy[DEVLINK_ATTR_RATE_TX_WEIGHT + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
+ [DEVLINK_ATTR_RATE_TRAFFIC_CLASS_INDEX] = { .type = NLA_U16, },
[DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_RATE_TX_SHARE] = { .type = NLA_U64, },
[DEVLINK_ATTR_RATE_TX_MAX] = { .type = NLA_U64, },
@@ -511,6 +513,7 @@ static const struct nla_policy devlink_rate_set_nl_policy[DEVLINK_ATTR_RATE_TX_W
static const struct nla_policy devlink_rate_new_nl_policy[DEVLINK_ATTR_RATE_TX_WEIGHT + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
+ [DEVLINK_ATTR_RATE_TRAFFIC_CLASS_INDEX] = { .type = NLA_U16, },
[DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_RATE_TX_SHARE] = { .type = NLA_U64, },
[DEVLINK_ATTR_RATE_TX_MAX] = { .type = NLA_U64, },
diff --git a/net/devlink/rate.c b/net/devlink/rate.c
index 7139e67e93ae..6812690883df 100644
--- a/net/devlink/rate.c
+++ b/net/devlink/rate.c
@@ -18,6 +18,12 @@ devlink_rate_is_node(struct devlink_rate *devlink_rate)
return devlink_rate->type == DEVLINK_RATE_TYPE_NODE;
}
+static inline bool
+devlink_rate_is_traffic_class(struct devlink_rate *devlink_rate)
+{
+ return devlink_rate->type == DEVLINK_RATE_TYPE_TRAFFIC_CLASS;
+}
+
static struct devlink_rate *
devlink_rate_leaf_get_from_info(struct devlink *devlink, struct genl_info *info)
{
@@ -31,6 +37,43 @@ devlink_rate_leaf_get_from_info(struct devlink *devlink, struct genl_info *info)
return devlink_rate ?: ERR_PTR(-ENODEV);
}
+static struct devlink_rate *
+devlink_rate_traffic_class_get_by_id(struct devlink *devlink, u16 tc_id)
+{
+ static struct devlink_rate *devlink_rate;
+
+ list_for_each_entry(devlink_rate, &devlink->rate_list, list) {
+ if (devlink_rate_is_traffic_class(devlink_rate) &&
+ devlink_rate->tc_id == tc_id)
+ return devlink_rate;
+ }
+
+ return ERR_PTR(-ENODEV);
+}
+
+static struct devlink_rate *
+devlink_rate_traffic_class_get_from_attrs(struct devlink *devlink, struct nlattr **attrs)
+{
+ struct devlink_rate *devlink_rate;
+ u16 tc_id;
+
+ if (!attrs[DEVLINK_ATTR_RATE_TRAFFIC_CLASS_INDEX])
+ return ERR_PTR(-EINVAL);
+
+ tc_id = nla_get_u16(attrs[DEVLINK_ATTR_RATE_TRAFFIC_CLASS_INDEX]);
+ devlink_rate = devlink_rate_traffic_class_get_by_id(devlink, tc_id);
+ if (!devlink_rate)
+ return ERR_PTR(-ENODEV);
+
+ return devlink_rate;
+}
+
+static struct devlink_rate *
+devlink_rate_traffic_class_get_from_info(struct devlink *devlink, struct genl_info *info)
+{
+ return devlink_rate_traffic_class_get_from_attrs(devlink, info->attrs);
+}
+
static struct devlink_rate *
devlink_rate_node_get_by_name(struct devlink *devlink, const char *node_name)
{
@@ -76,6 +119,8 @@ devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info)
return devlink_rate_leaf_get_from_info(devlink, info);
else if (attrs[DEVLINK_ATTR_RATE_NODE_NAME])
return devlink_rate_node_get_from_info(devlink, info);
+ else if (attrs[DEVLINK_ATTR_RATE_TRAFFIC_CLASS_INDEX])
+ return devlink_rate_traffic_class_get_from_info(devlink, info);
else
return ERR_PTR(-EINVAL);
}
@@ -106,6 +151,10 @@ static int devlink_nl_rate_fill(struct sk_buff *msg,
if (nla_put_string(msg, DEVLINK_ATTR_RATE_NODE_NAME,
devlink_rate->name))
goto nla_put_failure;
+ } else if (devlink_rate_is_traffic_class(devlink_rate)) {
+ if (nla_put_u16(msg, DEVLINK_ATTR_RATE_TRAFFIC_CLASS_INDEX, devlink_rate->tc_id) ||
+ nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_rate->devlink_port->index))
+ goto nla_put_failure;
}
if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_SHARE,
@@ -273,6 +322,10 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
err = ops->rate_node_parent_set(devlink_rate, NULL,
devlink_rate->priv, NULL,
info->extack);
+ else if (devlink_rate_is_traffic_class(devlink_rate))
+ err = ops->rate_traffic_class_parent_set(devlink_rate, NULL,
+ devlink_rate->priv, NULL,
+ info->extack);
if (err)
return err;
@@ -302,6 +355,10 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
err = ops->rate_node_parent_set(devlink_rate, parent,
devlink_rate->priv, parent->priv,
info->extack);
+ else if (devlink_rate_is_traffic_class(devlink_rate))
+ err = ops->rate_traffic_class_parent_set(devlink_rate, parent,
+ devlink_rate->priv, parent->priv,
+ info->extack);
if (err)
return err;
@@ -449,6 +506,32 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
"TX weight set isn't supported for the nodes");
return false;
}
+ } else if (type == DEVLINK_RATE_TYPE_TRAFFIC_CLASS) {
+ if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_traffic_class_tx_share_set) {
+ NL_SET_ERR_MSG(info->extack, "TX share set isn't supported for the traffic classes");
+ return false;
+ }
+ if (attrs[DEVLINK_ATTR_RATE_TX_MAX] && !ops->rate_traffic_class_tx_max_set) {
+ NL_SET_ERR_MSG(info->extack, "TX max set isn't supported for the traffic classes");
+ return false;
+ }
+ if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] &&
+ !ops->rate_traffic_class_parent_set) {
+ NL_SET_ERR_MSG(info->extack, "Parent set isn't supported for the traffic classes");
+ return false;
+ }
+ if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_traffic_class_tx_priority_set) {
+ NL_SET_ERR_MSG_ATTR(info->extack,
+ attrs[DEVLINK_ATTR_RATE_TX_PRIORITY],
+ "TX priority set isn't supported for the traffic classes");
+ return false;
+ }
+ if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT] && !ops->rate_traffic_class_tx_weight_set) {
+ NL_SET_ERR_MSG_ATTR(info->extack,
+ attrs[DEVLINK_ATTR_RATE_TX_WEIGHT],
+ "TX weight set isn't supported for the traffic classes");
+ return false;
+ }
} else {
WARN(1, "Unknown type of rate object");
return false;
@@ -659,6 +742,48 @@ int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv,
}
EXPORT_SYMBOL_GPL(devl_rate_leaf_create);
+/**
+ * devl_rate_traffic_class_create - create devlink rate queue
+ * @devlink: devlink instance
+ * @priv: driver private data
+ * @tc_id: identifier of the new traffic class
+ *
+ * Create devlink rate object of type node
+ */
+int devl_rate_traffic_class_create(struct devlink_port *devlink_port, void *priv, u16 tc_id,
+ struct devlink_rate *parent)
+{
+ struct devlink *devlink = devlink_port->devlink;
+ struct devlink_rate *devlink_rate;
+
+ devl_assert_locked(devlink);
+
+ devlink_rate = devlink_rate_traffic_class_get_by_id(devlink, tc_id);
+ if (!IS_ERR(devlink_rate))
+ return -EEXIST;
+
+ devlink_rate = kzalloc(sizeof(*devlink_rate), GFP_KERNEL);
+ if (!devlink_rate)
+ return -ENOMEM;
+
+ if (parent) {
+ devlink_rate->parent = parent;
+ refcount_inc(&devlink_rate->parent->refcnt);
+ }
+
+ devlink_rate->type = DEVLINK_RATE_TYPE_TRAFFIC_CLASS;
+ devlink_rate->devlink = devlink;
+ devlink_rate->devlink_port = devlink_port;
+ devlink_rate->tc_id = tc_id;
+ devlink_rate->priv = priv;
+ list_add_tail(&devlink_rate->list, &devlink->rate_list);
+ devlink_port->devlink_rate = devlink_rate;
+ devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_NEW);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devl_rate_traffic_class_create);
+
/**
* devl_rate_leaf_destroy - destroy devlink rate leaf
*
@@ -708,6 +833,9 @@ void devl_rate_nodes_destroy(struct devlink *devlink)
else if (devlink_rate_is_node(devlink_rate))
ops->rate_node_parent_set(devlink_rate, NULL, devlink_rate->priv,
NULL, NULL);
+ else if (devlink_rate_is_traffic_class(devlink_rate))
+ ops->rate_traffic_class_parent_set(devlink_rate, NULL, devlink_rate->priv,
+ NULL, NULL);
}
list_for_each_entry_safe(devlink_rate, tmp, &devlink->rate_list, list) {
if (devlink_rate_is_node(devlink_rate)) {
--
2.43.2
Powered by blists - more mailing lists