netdev - Re: [RFC, net-next] net: qos: introduce a frer action to implement 802.1CB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220506193103.hla2jlpawn6te5cl@skbuf>
Date:   Fri, 6 May 2022 19:31:03 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Ferenc Fejes <ferenc.fejes@...csson.com>
CC:     Xiaoliang Yang <xiaoliang.yang_1@....com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "vinicius.gomes@...el.com" <vinicius.gomes@...el.com>,
        "michael.chan@...adcom.com" <michael.chan@...adcom.com>,
        "saeedm@...dia.com" <saeedm@...dia.com>,
        "jiri@...dia.com" <jiri@...dia.com>,
        "idosch@...dia.com" <idosch@...dia.com>,
        "alexandre.belloni@...tlin.com" <alexandre.belloni@...tlin.com>,
        "allan.nielsen@...rochip.com" <allan.nielsen@...rochip.com>,
        "joergen.andreasen@...rochip.com" <joergen.andreasen@...rochip.com>,
        "jhs@...atatu.com" <jhs@...atatu.com>,
        Balázs Varga A <balazs.a.varga@...csson.com>,
        Janos Farkas <Janos.Farkas@...csson.com>,
        "moldovan@...t.bme.hu" <moldovan@...t.bme.hu>,
        Miklós Máté <mate@...t.bme.hu>
Subject: Re: [RFC, net-next] net: qos: introduce a frer action to implement
 802.1CB

On Fri, May 06, 2022 at 02:44:17PM +0000, Ferenc Fejes wrote:
> > Glad to see someone familiar with 802.1CB. I have a few questions and
> > concerns if you don't mind.
> 
> I CCd Balazs Varga  and Janos Farkas, experts of the TSN topics
> including 802.1CB as well. Istvan Moldovan's can also give valuable
> feedback as the author of our in-house userspace FRER. I'll also try my
> best to answer but I'm the least competent in the topic.
> 

Nope, that would probably be me ;)
I am commenting on Xiaoliang's patch without having even run it, and I
have only looked through the code diagonally, and I'm not exactly an
expert on the use cases that drove the standard either. So plenty of
chances to make mistakes. But nonetheless I hope that by explaining to
me where I'm wrong we'll be able to make progress with this.

> >
> > I think we are seeing a bit of a stall on the topic of FRER modeling in
> > the Linux networking stack, in no small part due to the fact that we are
> > working with pre-standard hardware.
> >
> > The limitation with Xiaoliang's proposal here (to model FRER stream
> > replication and recovery as a tc action) is that I don't think it works
> > well for traffic termination - it only covers properly the use case of a
> > switch. More precisely, there isn't a single convergent termination
> > point for either locally originating traffic, or locally received
> > traffic (i.e. you, as user, don't know on which interface of several
> > available to open a socket).
> >
> > In our hardware, this limitation isn't really visible because of the way
> > in which the Ethernet switch is connected inside the NXP LS1028A.
> 
> We have some NXP LS1028As as well so at least I familiar with the box :-)

Cool, this means we'll eventually reach a common understanding of the
topic.

> > It is something like this:
> >
> >    +---------------------------------------+
> >    |                                       |
> >    |           +------+ +------+           |
> >    |           | eno2 | | eno3 |           |
> >    |           +------+ +------+           |
> >    |              |         |              |
> >    |           +------+ +------+           |
> >    |           | swp4 | | swp5 |           |
> >    |           +------+ +------+           |
> >    |  +------+ +------+ +------+ +------+  |
> >    |  | swp0 | | swp1 | | swp2 | | swp3 |  |
> >    +--+------+-+------+-+------+-+------+--+
> >
> > In the above picture, the switch ports swp0-swp3 have eno3 as a DSA
> > master (connected to the internal swp5, a CPU port). The other internal
> > port, swp5, is configured as a DSA user port, so it has a net device.
> > Analogously, while eno3 is a DSA master and receives DSA-tagged traffic
> > (so it is useless for direct IP termination), eno2 receives DSA untagged
> > traffic and is therefore an IP termination endpoint into a switched
> > network.
>
> Unfortunately I'm not familiar with the distributed switch architecture
> (I only read a netdev paper from that and thats all) but I try to grasp
> on the problem.
> In my understanding, the main issue is the distinction between the
> locally terminated and forwarded TSN streams, because currently the DSA
> metadata tags are required to do that? Can you explain the problem for
> one who not familiar with DSA?

Forget about DSA, what I'm trying to get at is that you might one day
read the release notes of the Linux kernel and see that it gained
support for FRER using tc, and get all excited, download and compile it,
set up 2 machines connected through 2 port pairs, and try to configure
the systems to ping each other redundantly, to become familiar with how
it works. Start with something simple, what can be so hard about a ping ;)

You'll say something along the lines of

1. ok, I have 2 IP addresses, so I need 2 streams, one A -> B and one B -> A

2. I want to use the null stream identification function (MAC DA, VLAN ID
   for those following along) so I have to resolve each IP address to a
   MAC address to use as a stream identifier, but how? since the 2
   Ethernet cards on each system have different MAC addresses. Anyway,
   pick one and put the other card in promisc for now.

3. I have the MACs now, I want to configure the streams. The stream "A -> B"
   needs to be configured for splitting on the first system, and for
   sequence recovery on the second system. The stream "B -> A" needs to
   be configured for recovery on the first system and for splitting on
   the second.

4. Let's start with splitting, this is just the "mirred egress mirror"
   action, nothing FRER specific about it. There's also the "frer rtag
   tag-action tag-push" action which adds the redundancy tag. Good thing
   these actions can be chained. So let's put a filter on the egress
   qdisc of eth0, that matches on the MAC address of B, and has a mirred
   mirror action to eth1, and a "rtag tag-push" action. Notice how by
   this time, eth0 becomes sort of a "primary" interface and eth1 sort
   of a "secondary" interface. So if you ping, you need to use eth0.
   What if the link goes down on eth0 you ask, how does the "redundancy"
   in "frer" come into play, with the traffic still going through eth1?
   No time to ask questions like that, let's move on.

5. Let's say that both links are up, and system B is receiving a
   replicated stream with FRER tags on both eth0 and eth1. It wants to
   eliminate the duplicates and see a continuous flow of ICMP requests
   without the extra FRER tag. Back to the documentation. We see 2 kinds
   of stream recovery, one is "individual" recovery which is a
   "frer rtag recover" action put on the ingress qdisc of an interface,
   and the other is just "recovery", which is the same action but put on
   the egress qdisc. We don't want individual sequence recovery processes
   on eth0 and eth1 of station B, since those won't consider the packets
   as being members of the same stream, and the'll still be duplicated.
   So we want the normal recovery. But on whose netdev's egress qdisc do
   we put the "rtag recover" action? Both eth0 and eth1 are receiving.
   There is no central convergence point.

Now you're stumped and thinking, how is this supposed to be used?
What can you do with it? I mean, I can probably create a veth pair as
that aforementioned missing convergence point, and guide packets from
{eth0, eth1} towards the lefthand side of the veth pair, using mirred
redirect.
Then I can put the frer rules on the egress qdisc of the lefthand side
of the veth pair, and recover the plaintext traffic (no duplicates, no
RTAG) on the righthand side of the veth pair. But... seriously?
And there is not even one mention of this in the documentation?
And even so. You need to send the request through eno0 and expect to
receive the reply through a veth interface? How is any user space
application ever going to work?


Now comes the connection with DSA. Xiaoliang made tc-frer with LS1028A
offloading in mind. No criticism there, after all it is the hardware we
are working with.

The intended usage pattern is to put the FRER rules on the switch port
netdevices, and to do the termination on the switch-unaware netdevices.
In other words, it's as if eno2 is connected to a completely external
RedBox, and tc-frer only serves externally received traffic. Except that
those 2 isolated parts of the system are physically embedded in one.

So at step (1) you put the IP on eno2, at step (2) you choose the MAC
address for the stream to be that of eno2, at step (4) you configure the
split action (mirred towards the external ports, plus FRER tag push) on
the _ingress_ of swp4 (traffic sent by eno2 is received by swp4).
At step (5) you put the sequence recovery on the _egress_ of swp4
(traffic that egresses swp4 ingresses eno2).

So then you might ask, what would we do if we didn't have that eno2 <->
swp4 port pair? Is tc-frer useful for someone who doesn't, but is maybe
even able to offload 802.1CB streams, including termination, through
some other paradigm? The thing is that, as far as I can tell, Linux does
not really like to set up a network for the exclusive use of others
(pure forwarding), to which it has no local access. This is essentially
the design of tc-frer, and my issue with it.

> >
> > What we do in this case is put tc-frer rules for stream replication and
> > recovery on swp4 itself, and we use eno2 as the convergence point for
> > locally terminated streams.
> >
> > However, naturally, a hardware design that does not look like this can't
> > terminate traffic like this.
> 
> Yes, this is my concern too. What would be a nice to have thing if the
> user can configure the SW implementation and the HW offload with the
> same commands and the original tc-frer approach fits well to this
> concept. Anything towards that direction is the way forward IMO, even if
> the underlying implementation will change.
> >
> > My idea was that it might be better if FRER was its own virtual network
> > interface (like a bridge), with multiple slave interfaces. The FRER net
> > device could keep its own database of streams and actions (completely
> > outside of tc) which would be managed similar to "bridge fdb add ...".
> > This way, the frer0 netdevice would be the local termination endpoint,
> > logically speaking.
> 
> Interesting approach. To be honest I dont see the long term implications
> of this solution, others might have ideas about the pros and cons, but
> that looks like a solution where local stream termination is trivial.

The implication is that you can easily do stuff with FRER. Maybe I'm
relying too much on ping as an example, but I am really lacking real
life use cases. Feedback here would be extremely appreciated.

> > What I don't know for sure is if a FRER netdevice is supposed to forward
> > frames which aren't in its list of streams (and if so, by which rules).
> 
> Yes this sounds correct, somehow non-local packets should be forwarded
> too with a bridge. Is it possible to the linux bridge recognize if one
> port is a frer0 port (or on the frer0 if that is enslaved) and do the
> forwarding of the streams? Re-implementing bridge functions just for the
> frer device would be redundant. Unfortunately I never dug myself deep
> enough into the linux bridge code, just when debugged VXLAN ARP
> suppression for EVPN, but I think it would be possible to exchange some
> metadatas between the bridge and the frer device to do the
> forwarding/terminating decision, something like here [0]

The other question if you're in favor of "FRER as net device" is whether
we should have a FRER interface per TSN stream (or per stream pair, RX
and TX, since streams are unidirectional), or a FRER interface for all
TSN streams. If the latter, we're moving more towards "FRER integrated
in bridge" territory. Or... maybe even resolve local termination through
some other mechanism, and still build on top of a tc-frer action.

The thing with "FRER as net device" on the other hand is that we've
already started modeling PSFP through tc. So if the FRER device has its
own rules, then "these" streams are not the same as "those" streams, and
a user would have to duplicate parts of the configuration. Whereas I
think the PSFP standard refers to stream identifiers directly from 802.1CB.

> > Because if a FRER netdevice is supposed to behave like a regular bridge
> > for non-streams, the implication is that the FRER logic should then be
> > integrated into the Linux bridge.
>
> This is (for me) more appealing. Also we can keep that in mind when
> Linux will support deterministic layer3 networking (IETF DetNet WG RFCs)
> it would be nice to have mapping between TSN and DetNet streams, then
> forward the packets on DetNet tunnels as well (with different
> endpoints). This is something our team researching so Balazs and Istvan
> might give you some info about that. But I admit that thinking about
> playing nicely with DetNet in regard of the current linux FRER
> implementation is more than overwhelming, but the Linux bridge would be
> a nice place to map TSN flows to DetNet flow like currently EVPN maps
> VLANs to VXLANs.

So what would be the use case for bridging packets belonging to
unrecognized TSN streams? In my toy setups I almost ran out of ideas how
to drop unwanted traffic and prevent it from being looped forever.
STP, MSTP, MRP are all out the window, this is active redundancy, you
need to embrace the loops, so it isn't as if you can pretend that
something sane is going to happen with a packet if it isn't part of a
stream that gets special handling from 802.1CB. No broadcast, no
multicast, and self address filtering on all switch ports.

> > Also, this new FRER software model complicates the offloading on NXP
> > LS1028A, but let's leave that aside, since it shouldn't really be the
> > decisive factor on what should the software model look like.
> >
> > Do you have any comments on this topic?
> I would like to see if others can join to the discussion as well, I will
> try to think about this problem more too.
>
> [0] https://lore.kernel.org/netdev/20220301050439.31785-10-roopa@nvidia.com/
>
> Best,
> Ferenc