[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <de1cb0ab-163c-02e8-86b0-fc865796a40a@intel.com>
Date: Wed, 9 Nov 2022 19:54:52 +0100
From: "Wilczynski, Michal" <michal.wilczynski@...el.com>
To: Jakub Kicinski <kuba@...nel.org>
CC: <netdev@...r.kernel.org>, <alexandr.lobakin@...el.com>,
<jacob.e.keller@...el.com>, <jesse.brandeburg@...el.com>,
<przemyslaw.kitszel@...el.com>, <anthony.l.nguyen@...el.com>,
<ecree.xilinx@...il.com>, <jiri@...nulli.us>
Subject: Re: [PATCH net-next v10 10/10] ice: add documentation for
devlink-rate implementation
On 11/8/2022 11:39 PM, Jakub Kicinski wrote:
> On Mon, 7 Nov 2022 19:13:26 +0100 Michal Wilczynski wrote:
>> Add documentation to a newly added devlink-rate feature. Provide some
>> examples on how to use the features, which netlink attributes are
>> supported and descriptions of the attributes.
>> +Devlink Rate
>> +==========
>> +
>> +The ``ice`` driver implements devlink-rate API. It allows for offload of
>> +the Hierarchical QoS to the hardware. It enables user to group Virtual
>> +Functions in a tree structure and assign supported parameters: tx_share,
>> +tx_max, tx_priority and tx_weight to each node in a tree. So effectively
>> +user gains an ability to control how much bandwidth is allocated for each
>> +VF group. This is later enforced by the HW.
>> +
>> +It is assumed that this feature is mutually exclusive with DCB and ADQ, or
>> +any driver feature that would trigger changes in QoS, for example creation
>> +of the new traffic class.
> Meaning? Will the devlink API no longer reflect reality once one of
> the VFs enables DCB for example?
By DCB I mean the DCB that's implemented in the FW, and I'm not aware
of any flow that would enable the VF to tweak FW DCB on/off. Additionally
there is a commit in this patch series that should prevent any devlink-rate
changes if the FW DCB is enabled, and should prevent enabling FW DCB
enablement if any changes were made with the devlink-rate.
I don't think there is a way to detect that the SW DCB is enabled though.
In that case the software would try to enforce it's own settings in the SW
stack and the HW would try to enforce devlink-rate settings.
>
>> This feature is also dependent on switchdev
>> +being enabled in the system. It's required bacause devlink-rate requires
>> +devlink-port objects to be present, and those objects are only created
>> +in switchdev mode.
>> +
>> +If the driver is set to the switchdev mode, it will export
>> +internal hierarchy the moment the VF's are created. Root of the tree
>> +is always represented by the node_0. This node can't be deleted by the user.
>> +Leaf nodes and nodes with children also can't be deleted.
>> +
>> +.. list-table:: Attributes supported
>> + :widths: 15 85
>> +
>> + * - Name
>> + - Description
>> + * - ``tx_max``
>> + - This attribute allows for specifying a maximum bandwidth to be
> Drop the "This attribute allows for specifying a" from all attrs.
Sure.
>
>> + consumed by the tree Node. Rate Limit is an absolute number
>> + specifying a maximum amount of bytes a Node may consume during
>> + the course of one second. Rate limit guarantees that a link will
>> + not oversaturate the receiver on the remote end and also enforces
>> + an SLA between the subscriber and network provider.
>> + * - ``tx_share``
> Wouldn't it be more common to call this tx_min, like in the old VF API
> and the cgroup APIs?
I agree on this one, I'm not really sure why this attribute is called
tx_share. In it's iproute documentation tx_share is described as:
"specifies minimal tx rate value shared among all rate objects. If rate
object is a part of some rate group, then this value shared with rate
objects of this rate group.".
So tx_min is more intuitive, but I suspect that the original author
wanted to emphasize that this BW is shared among all the children
nodes.
>
>> + - This attribute allows for specifying a minimum bandwidth allocated
>> + to a tree node when it is not blocked. It specifies an absolute
>> + BW. While tx_max defines the maximum bandwidth the node may consume,
>> + the tx_share marks committed BW for the Node.
>> + * - ``tx_priority``
>> + - This attribute allows for usage of strict priority arbiter among
>> + siblings. This arbitration scheme attempts to schedule nodes based
>> + on their priority as long as the nodes remain within their
>> + bandwidth limit. Range 0-7.
> Nodes meaning it will (W)RR across all nodes of highest prio?
Yes nodes with the same priority will be treated equally.
>
> Is prio 0 or 7 highest?
7 is the highest, nodes with 7 are selected first.
>
>> + * - ``tx_weight``
>> + - This attribute allows for usage of Weighted Fair Queuing
>> + arbitration scheme among siblings. This arbitration scheme can be
>> + used simultaneously with the strict priority. Range 1-200.
> Would be good to specify how the interaction with SP looks.
In each arbitration flow, the winning node will be selected from
the group’s non-blocked siblings with the highest priority.
The basic rule is that each sibling group consists of a sub group
with high priority nodes, a WFQ sub group with intermediate
priority nodes, and a sub group with low priority nodes.
So basically if several sibling nodes have same priority configured
they are treated as a sub-group and arbitration among them is
performed using weights.
Will add this info in the documentation.
> Does the absolute value of the weight matter or only the relative
> values? (IOW is 1 vs 10 the same as 10 vs 100)
Only relative values matter. Will also add this to documentation
I guess.
Powered by blists - more mailing lists