[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240814085320.134075-1-cratiu@nvidia.com>
Date: Wed, 14 Aug 2024 11:51:45 +0300
From: Cosmin Ratiu <cratiu@...dia.com>
To: <netdev@...r.kernel.org>
CC: <cjubran@...dia.com>, <cratiu@...dia.com>, <tariqt@...dia.com>,
<saeedm@...dia.com>, <yossiku@...dia.com>, <gal@...dia.com>,
<jiri@...dia.com>
Subject: [RFC PATCH] devlink: Extend the devlink rate API to support DCB ETS
Hi,
We have support for DCB-ETS for quite a while: mapping priorities to TC
(traffic class) and setting min-share per TC. The configuration is set on the
PF/uplink and affects all the traffic/queues including any VFs/SFs instantiated
from that PF.
We have a customer requirement to apply ETS for a group of VFs. For example,
TC0 and TC5 for TCP/UDP and RoCE, 20 / 80 respectively.
Two options we considered that didn’t meet the bar:
1. MQPRIO: the DSCP or VLAN-PCP values are set by the VM, one can use MQPRIO
qdisc from inside the VM to implement ETS, however, it is not possible to share a
qdisc across multiple net-devices.
2. TC police action: use TC filters (on the VF-representors) to classify packet
based on DSCP/PCP and apply a policer (a policer action can be shared across
multiple filters). However, this is policing and not traffic shaping (no
backpressure to the VM).
To this end, devlink-rate seems to be the most suitable interface – has support
for group of VFs. Following are two options to extend it:
1. In addition to leaf and node, expose a new type: TC.
Each VF will expose TC0 to TC7:
pci/0000:04:00.0/1/tc0
…
pci/0000:04:00.0/1/tc7
Example:
DEV=pci/0000:04:00.0
# Creating a group:
devlink port function rate add $DEV/vfs_group tx_share 10Gbit tx_max 50Gbit
# Creating two groups TC0 and TC5:
devlink port function rate add $DEV/group_tc0 tx_weight 20 parent vfs_group
devlink port function rate add $DEV/group_tc5 tx_weight 80 parent vfs_group
# Adding TCs
devlink port function rate set $DEV/1/tc0 parent group_tc0
devlink port function rate set $DEV/2/tc0 parent group_tc0
devlink port function rate set $DEV/1/tc5 parent group_tc5
devlink port function rate set $DEV/2/tc5 parent group_tc5
2. New option to specify the bandwidth proportions between the TCs:
devlink port function rate add $DEV/vfs_group \
tx_share 10Gbit tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0
All traffic (group of VFs) will be subjected to share (10), max (50) and adhere
to the specified TC weights.
Moreover, we need a mechanism to configure TC-priority mapping and selecting
the DSCP or PCP (trust state); this could be either using the existing tools
(e.g., dcb-ets/lldp) or by other means.
This patch is based on a previous attempt at extending devlink rate to support
per queue rate limiting ([1]). In that case, a different approach was chosen.
Here, we propose extending devlink rate with a new node type to represent
traffic classes, necessary because of the interactions with VF groups.
We’ll appreciate any feedback.
Cosmin.
[1] https://lore.kernel.org/netdev/20220915134239.1935604-3-michal.wilczynski@intel.com/
Powered by blists - more mailing lists