linux-kernel - Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210629093055.x5pvcebk4y4f6nem@skbuf>
Date:   Tue, 29 Jun 2021 12:30:55 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Lukasz Majewski <lukma@...x.de>
Cc:     Andrew Lunn <andrew@...n.ch>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Madalin Bucur <madalin.bucur@....nxp.com>,
        Nicolas Ferre <nicolas.ferre@...rochip.com>,
        Joakim Zhang <qiangqing.zhang@....com>,
        Florian Fainelli <f.fainelli@...il.com>,
        netdev@...r.kernel.org, Arnd Bergmann <arnd@...db.de>,
        Mark Einon <mark.einon@...il.com>,
        NXP Linux Team <linux-imx@....com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP
 L2 switch

On Tue, Jun 29, 2021 at 10:09:37AM +0200, Lukasz Majewski wrote:
> Hi Vladimir,
>
> > On Mon, Jun 28, 2021 at 04:13:14PM +0200, Lukasz Majewski wrote:
> > > > > > So before considering merging your changes, i would like to
> > > > > > see a usable binding.
> > > > > >
> > > > > > I also don't remember seeing support for STP. Without that,
> > > > > > your network has broadcast storm problems when there are
> > > > > > loops. So i would like to see the code needed to put ports
> > > > > > into blocking, listening, learning, and forwarding states.
> > > > > >
> > > > > > 	  Andrew
> > > >
> > > > I cannot stress enough how important it is for us to see STP
> > > > support and consequently the ndo_start_xmit procedure for switch
> > > > ports.
> > >
> > > Ok.
> > >
> > > > Let me see if I understand correctly. When the switch is enabled,
> > > > eth0 sends packets towards both physical switch ports, and eth1
> > > > sends packets towards none, but eth0 handles the link state of
> > > > switch port 0, and eth1 handles the link state of switch port 1?
> > >
> > > Exactly, this is how FEC driver is utilized for this switch.
> >
> > This is a much bigger problem than anything which has to do with code
> > organization. Linux does not have any sort of support for unmanaged
> > switches.
>
> My impression is similar. This switch cannot easily fit into DSA (lack
> of appending tags)

No, this is not why the switch does not fit the DSA model.
DSA assumes that the master interface and the switch are two completely
separate devices which manage themselves independently. Their boundary
is typically at the level of a MAC-to-MAC connection, although vendors
have sometimes blurred this line a bit in the case of integrated
switches. But the key point is that if there are 2 external ports going
to the switch, these should be managed by the switch driver. But when
the switch is sandwiched between the Ethernet controller of the "DSA
master" (the DMA engine of fec0) and the DSA master's MAC (still owned
by fec), the separation isn't quite what DSA expects, is it? Remember
that in the case of the MTIP switch, the fec driver needs to put the
MACs going to the switch in promiscuous mode such that the switch
behaves as a switch and actually forwards packets by MAC DA instead of
dropping them. So the system is much more tightly coupled.

 +---------------------------------------------------------------------------+
 |                                                                           |
 | +--------------+        +--------------------+--------+      +------------+
 | |              |        | MTIP switch        |        |      |            |
 | |   fec 1 DMA  |---x    |                    | Port 2 |------| fec 1 MAC  |
 | |              |        |            \  /    |        |      |            |
 | +--------------+        |             \/     +--------+      +------------+
 |                         |             /\              |                   |
 | +--------------+        +--------+   /  \    +--------+      +------------+
 | |              |        |        |           |        |      |            |
 | |   fec 0 DMA  |--------| Port 0 |           | Port 1 |------| fec 0 MAC  |
 | |              |        |        |           |        |      |            |
 | +--------------+        +--------+-----------+--------+      +------------+
 |                                                                           |
 +---------------------------------------------------------------------------+

Is this DSA? I don't really think so, but you could still try to argue
otherwise.

The opposite is also true. DSA supports switches that don't append tags
to packets (see sja1105). This doesn't make them "less DSA", just more
of a pain to work with.

> nor to switchdev.
>
> The latter is caused by two modes of operation:
>
> - Bypass mode (no switch) -> DMA1 and DMA0 are used
> - Switch mode -> only DMA0 is used
>
>
> Moreover, from my understanding of the CPSW - looks like it uses always
> just a single DMA, and the switching seems to be the default operation
> for two ethernet ports.
>
> The "bypass mode" from NXP's L2 switch seems to be achieved inside the
> CPSW switch, by configuring it to not pass packets between those ports.

I don't exactly see the point you're trying to make here. At the end of
the day, the only thing that matters is what you expose to the user.
With no way (when the switch is enabled) for a socket opened on eth0 to
send/receive packets coming only from the first port, and a socket
opened on eth1 to send/receive packets coming only from the second port,
I think this driver attempt is a pretty far cry from what a switch
driver in Linux is expected to offer, be it modeled as switchdev or DSA.

> > Please try to find out if your switch is supposed to be able
> > to be managed (run control protocols on the CPU).
>
> It can support all the "normal" set of L2 switch features:
>
> - VLANs, lookup table (with learning), filtering and forwarding
>   (Multicast, Broadcast, Unicast), priority queues, IP snooping, etc.
>
> Frames for BPDU are recognized by the switch and can be used to
> implement support for RSTP. However, this switch has a separate address
> space (not covered and accessed by FEC address).
>
> > If not, well, I
> > don't know what to suggest.
>
> For me it looks like the NXP's L2 switch shall be treated _just_ as
> offloading IP block to accelerate switching (NXP already support
> dpaa[2] for example).
>
> The idea with having it configured on demand, when:
> ip link add name br0 type bridge; ip link set br0 up;
> ip link set eth0 master br0;
> ip link set eth1 master br0;
>
> Seems to be a reasonable one. In the above scenario it would work hand
> by hand with FEC drivers (as those would handle PHY communication
> setup and link up/down events).

You seem to imply that we are suggesting something different.

> It would be welcome if the community could come up with some rough idea
> how to proceed with this IP block support

Ok, so what I would do if I really cared that much about mainline
support is I would refactor the FEC driver to offer its core
functionality to a new multi-port driver that is able to handle the FEC
DMA interfaces, the MACs and the switch. EXPORT_SYMBOL_GPL is your
friend.

This driver would probe on a device tree binding with 3 "reg" values: 1
for the fec@...f0000, 1 for the fec@...f4000 and 1 for the switch@...f8000.
No puppet master driver which coordinates other drivers, just a single
driver that, depending on the operating state, manages all the SoC
resources in a way that will offer a sane and consistent view of the
Ethernet ports.

So it will have a different .ndo_start_xmit implementation depending on
whether the switch is bypassed or not (if you need to send a packet on
eth1 and the switch is bypassed, you send it through the DMA interface
of eth1, otherwise you send it through the DMA interface of eth0 in a
way in which the switch will actually route it to the eth1 physical
port).

Then I would implement support for BPDU RX/TX (I haven't looked at the
documentation, but I expect that what this switch offers for control
traffic doesn't scale at high speeds (if it does, great, then send and
receive all your packets as control packets, to have precise port
identification). If it doesn't, you'll need a way to treat your data
plane packets differently from the control plane packets. For the data
plane, you can perhaps borrow some ideas from net/dsa/tag_8021q.c, or
even from Tobias Waldekranz's proposal to just let data plane packets
coming from the bridge slide into the switch with no precise control of
the destination port at all, just let the switch perform FDB lookups for
those packets because the switch hardware FDB is supposed to be more or
less in sync with the bridge software FDB:
https://patchwork.kernel.org/project/netdevbpf/cover/20210426170411.1789186-1-tobias@waldekranz.com/

> (especially that for example imx287 is used in many embedded devices
> and is going to be in active production for next 10+ years).

Well, I guess you have a plan then. There are still 10+ years left to
enjoy the benefits of a proper driver design...