lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240814085320.134075-1-cratiu@nvidia.com>
Date: Wed, 14 Aug 2024 11:51:45 +0300
From: Cosmin Ratiu <cratiu@...dia.com>
To: <netdev@...r.kernel.org>
CC: <cjubran@...dia.com>, <cratiu@...dia.com>, <tariqt@...dia.com>,
	<saeedm@...dia.com>, <yossiku@...dia.com>, <gal@...dia.com>,
	<jiri@...dia.com>
Subject: [RFC PATCH] devlink: Extend the devlink rate API to support DCB ETS

Hi,

We have support for DCB-ETS for quite a while: mapping priorities to TC
(traffic class) and setting min-share per TC. The configuration is set on the
PF/uplink and affects all the traffic/queues including any VFs/SFs instantiated
from that PF.

We have a customer requirement to apply ETS for a group of VFs. For example,
TC0 and TC5 for TCP/UDP and RoCE, 20 / 80 respectively.

Two options we considered that didn’t meet the bar:

1. MQPRIO: the DSCP or VLAN-PCP values are set by the VM, one can use MQPRIO
qdisc from inside the VM to implement ETS, however, it is not possible to share a
qdisc across multiple net-devices.
2. TC police action: use TC filters (on the VF-representors) to classify packet
based on DSCP/PCP and apply a policer (a policer action can be shared across
multiple filters). However, this is policing and not traffic shaping (no
backpressure to the VM).

To this end, devlink-rate seems to be the most suitable interface – has support
for group of VFs. Following are two options to extend it:

1. In addition to leaf and node, expose a new type: TC.
Each VF will expose TC0 to TC7:
pci/0000:04:00.0/1/tc0
…
pci/0000:04:00.0/1/tc7

Example:
DEV=pci/0000:04:00.0
# Creating a group:
devlink port function rate add $DEV/vfs_group tx_share 10Gbit tx_max 50Gbit

# Creating two groups TC0 and TC5:
devlink port function rate add $DEV/group_tc0 tx_weight 20 parent vfs_group
devlink port function rate add $DEV/group_tc5 tx_weight 80 parent vfs_group

# Adding TCs
devlink port function rate set $DEV/1/tc0 parent group_tc0
devlink port function rate set $DEV/2/tc0 parent group_tc0

devlink port function rate set $DEV/1/tc5 parent group_tc5
devlink port function rate set $DEV/2/tc5 parent group_tc5

2. New option to specify the bandwidth proportions between the TCs:
devlink port function rate add $DEV/vfs_group \
  tx_share 10Gbit tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0

All traffic (group of VFs) will be subjected to share (10), max (50) and adhere
to the specified TC weights.

Moreover, we need a mechanism to configure TC-priority mapping and selecting
the DSCP or PCP (trust state); this could be either using the existing tools
(e.g., dcb-ets/lldp) or by other means.

This patch is based on a previous attempt at extending devlink rate to support
per queue rate limiting ([1]). In that case, a different approach was chosen.
Here, we propose extending devlink rate with a new node type to represent
traffic classes, necessary because of the interactions with VF groups.

We’ll appreciate any feedback.

Cosmin.

[1] https://lore.kernel.org/netdev/20220915134239.1935604-3-michal.wilczynski@intel.com/


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ