netdev - Re: net-shapers plan

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a6beaa28-cd5d-4a8b-9df5-9f09b2632849@nvidia.com>
Date: Wed, 23 Apr 2025 09:50:34 +0300
From: Carolina Jubran <cjubran@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Cosmin Ratiu <cratiu@...dia.com>,
 "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
 "horms@...nel.org" <horms@...nel.org>,
 "andrew+netdev@...n.ch" <andrew+netdev@...n.ch>,
 "davem@...emloft.net" <davem@...emloft.net>, Tariq Toukan
 <tariqt@...dia.com>, Gal Pressman <gal@...dia.com>,
 "jiri@...nulli.us" <jiri@...nulli.us>,
 "edumazet@...gle.com" <edumazet@...gle.com>,
 Saeed Mahameed <saeedm@...dia.com>, "pabeni@...hat.com" <pabeni@...hat.com>
Subject: Re: net-shapers plan

On 14/04/2025 19:27, Jakub Kicinski wrote:
> On Mon, 14 Apr 2025 11:27:00 +0300 Carolina Jubran wrote:
>>> I hope you understand my concern, tho. Since you're providing the first
>>> implementation, if the users can grow dependent on such behavior we'd
>>> be in no position to explain later that it's just a quirk of mlx5 and
>>> not how the API is intended to operate.
>>
>> Thanks for bringing this up. I want to make it clear that traffic
>> classes must be properly matched to queues. We don’t rely on the
>> hardware fallback behavior in mlx5. If the driver or firmware isn’t
>> configured correctly, traffic class bandwidth control won’t work as
>> expected — the user will suffer from constant switching of the TX queue
>> between scheduling queues and head-of-line blocking. As a result, users
>> shouldn’t expect reliable performance or correct bandwidth allocation.
>> We don’t encourage configuring this without proper TX queue mapping, so
>> users won’t grow dependent on behavior that only happens to work without it.
>> We tried to highlight this in the plan section discussing queue
>> selection and head-of-line blocking: To make traffic class shaping work,
>> we must keep traffic classes separate for each transmit queue.
> 
> Right, my concern is more that there is no requirement for explicit
> configuration of the queues, as long as traffic arrives silo'ed WRT
> DSCP markings. As long as a VF sorts the traffic it does not have
> to explicitly say (or even know) that queue A will land in TC N.
> 

Even if the VF sends DSCP marked traffic, the packet's classification 
into a traffic class still depends on the prio-to-TC mapping set by the 
hypervisor. Without that mapping, the hardware can't reliably classify 
packets, and traffic may not land in the intended TC.

Overall, for traffic class separation and scheduling to work as 
intended, the VF and hypervisor need to be in sync. The VF provides the 
markings, but the hypervisor owns the classification logic.

The hypervisor sets up the classification mechanism; it’s up to the VFs 
to use it correctly, otherwise, packets will be misclassified. In a 
virtualized setup, VFs are untrusted and don’t control classification or 
shaping, they just select which queue to transmit on.

> BTW the classification is before all rewrites? IOW flower or any other
> forwarding rules cannot affect scheduling?

The classification happens after forwarding actions. So yes, if the user 
modifies DSCP or VLAN priority as part of a TC rule, that rewritten 
value is what we use for classification and scheduling. The 
classification reflects how the packet will look on the wire.