[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874jy8mo0n.fsf@nvidia.com>
Date: Fri, 19 Aug 2022 12:50:46 +0200
From: Petr Machata <petrm@...dia.com>
To: <Daniel.Machon@...rochip.com>
CC: <netdev@...r.kernel.org>, <kuba@...nel.org>,
<vinicius.gomes@...el.com>, <vladimir.oltean@....com>,
<thomas.petazzoni@...tlin.com>, <Allan.Nielsen@...rochip.com>,
<maxime.chevallier@...tlin.com>, <nikolay@...dia.com>,
<roopa@...dia.com>
Subject: Re: Basic PCP/DEI-based queue classification
<Daniel.Machon@...rochip.com> writes:
> Hi netdev,
>
> I am posting this thread in continuation of:
>
> https://lore.kernel.org/netdev/20220415173718.494f5fdb@fedora/
>
> and as a new starting point for further discussion of offloading PCP-based
> queue classification into the classification tables of a switch.
>
> Today, we use a proprietary tool to configure the internal switch tables for
> PCP/DEI and DSCP based queue classification [1]. We are, however, looking for
> an upstream solution.
>
> More specifically we want an upstream solution which allows projects like DENT
> and others with similar purpose to implement the ieee802-dot1q-bridge.yang [2].
> As a first step we would like to focus on the priority maps of the "Priority
> Code Point Decoding Table" and "Priority Code Point Enconding table" of the
> 802.1Q-2018 standard. These tables are well defined and maps well to the
> hardware.
>
> The purpose is not to create a new kernel interface which looks like what IEEE
> defines - but rather to do the needed plumbing to allow user-space tools to
> implement an interface like this.
>
> In essence we need an upstream solution that initially supports:
>
> - Per-port mapping of PCP/DEI to QoS class. For both ingress and egress.
>
> - Per-port default priority for frames which are not VLAN tagged.
This exists in DCB APP. Rules with selector 1 (EtherType) and PID 0
assign a default priority. iproute2's dcb tool supports this.
> - Per-port configuration of "trust" to signal if the VLAN-prio shall be used,
> or if port default priority shall be used.
This would be nice. Currently mlxsw ports are in trust PCP mode until
the user configures any DSCP rules. Then it switches to trust DSCP.
There's no way to express "trust both", or to configure the particular
PCP mapping for trust PCP (it's just hardcoded as 1:1).
Re this "VLAN or default", note it's not (always) either-or. In Spectrum
switches, the default priority is always applicable. E.g. for a port in
trust PCP mode, if a packet has no 802.1q header, it gets port-default
priority. 802.1q describes the default priority as "for use when
application priority is not otherwise specified", so I think this
behavior actually matches the standard.
> In the old thread, Maxime has compiled a list of ways we can possibly offload
> the queue classification. However none of them is a good match for our purpose,
> for the following reasons:
>
> - tc-flower / tc-skbedit: The filter and action scheme maps poorly to hardware
> and would require one hardware-table entry per rule. Even less of a match
> when DEI is also considered. These tools are well suited for advanced
> classification, and not so much for basic per-port classification.
Yeah.
Offloading this is a pain. You need to parse out the particular shape of
rules (which is not a big deal honestly), and make sure the ordering of
the rules is correct and matches what the HW is doing. And tolerate any
ACL-/TCAM- like rules as well. And there's mismatch between how a
missing rule behaves in SW (fall-through) and HW (likely priority 0 gets
assigned).
And configuration is pain as well, because a) it's a whole bunch of
rules to configure, and b) you need to be aware of all the limitations
from the previous paragraph and manage the coexistence with ACL/TCAM
rules.
It's just not a great story for this functionality.
I wonder if a specialized filter or action would make things easier to
work with. Something like "matchall action dcb dscp foo bar priority 7".
> - ip-link: The ingress and egress maps of ip-link is per-linux-vlan interface;
> we need per-port mapping. Not possible to map both PCP and DEI.
>
> - dcb-app: Not possible to map PCP/DEI (only DSCP).
>
> We have been looking around the kernel to snoop what other switch driver
> developers do, to configure basic per-port PCP/DEI based queue classification,
> and have not been able to find anything useful, in the standard kernel
> interfaces. It seems like people use their own out-of-tree tools to configure
> this (like mlnx_qos from Mellanox [3]).
>
> Finally, we would appreciate any input to this, as we are looking for an
> upstream solution that can be accepted by the community. Hopefully we can
> arrive at some consensus on whether this is a feature that can be of general
> use by developers, and furthermore, in which part of the kernel it should
> reside:
>
> - ethtool: add new setting to configure the pcp tables (seems like a good
> candidate to us).
>
> - ip-link: add support for per-port-interface ingress and egress mapping of
> pcp/dei
>
> - dcb-*: as an extension or new command to the dcb utilities. The pcp tables
> seems to be in line with what dcb-app does with the application priority
> table.
I'm not a fan of DCB, but the TC story is so unconvincing that this
looks good in comparison.
But note that DCB as such is standardized. I think the dcb-maxrate
interfaces are not, and the DCB subsystem has a whole bunch of weird
pre-standard stuff that's not exposed. But what's in iproute2 dcb is
largely standard. So maybe this should be hidden under some extension
attribute.
> - somewhere else
>
> In summary:
>
> - We would like feedback from the community on the suggested implemenation of
> the ieee-802.1Q Priority Code Point encoding an decoding tables.
>
> - And if we can agree that such a solution could and should be implemented;
> where should the implemenation go?
>
> - Also, should the solution be supported in the sw-bridge as well.
That would be ideal, yeah.
Powered by blists - more mailing lists