netdev - Re: [PATCH 4/4] cpsw: add switchdev support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 2 Jun 2018 13:34:32 +0300
From:   Ilias Apalodimas <ilias.apalodimas@...aro.org>
To:     Florian Fainelli <f.fainelli@...il.com>
Cc:     Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
        grygorii.strashko@...com, ivan.khoronzhuk@...aro.org,
        nsekhar@...com, jiri@...nulli.us, ivecera@...hat.com,
        francois.ozog@...aro.org, yogeshs@...com, spatton@...com
Subject: Re: [PATCH 4/4] cpsw: add switchdev support

Hi Florian, 

Thanks for taking time to look into this

On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
> 
> 
> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
> > On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
> >> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
> >>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
> >>>> Device tree is supposed to describe the hardware. Using that hardware
> >>>> in different ways is not something you should describe in DT.
> >>>>
> >>> The new switchdev mode is applied with a .config option in the kernel. What you
> >>> see is pre-existing code, so i am not sure if i should change it in this
> >>> patchset.
> >>
> >> If you break the code up into a library and two drivers, it becomes a
> >> moot point.
> > Agree
> > 
> >>
> >> But what i don't like here is that the device tree says to do dual
> >> mac. But you ignore that and do sometime else. I would prefer that if
> >> DT says dual mac, and switchdev is compiled in, the probe fails with
> >> EINVAL. Rather than ignore something, make it clear it is invalid.
> > The switch has 3 modes of operation as is.
> > 1. switch mode, to enable that you don't need to add anything on
> > the DTS and linux registers a single netdev interface.
> > 2. dual mac mode, this is when you need to add dual_emac; on the DTS.
> > 3. switchdev mode which is controlled by a .config option, since as you 
> > pointed out DTS was not made for controlling config options. 
> > 
> > I agree that this is far from beautiful. If the driver remains as in though,
> > i'd prefer either keeping what's there or making "switchdev" a DTS option, 
> > following the pre-existing erroneous usage rather than making the device 
> > unusable.  If we end up returning some error and refuse to initialize, users 
> > that remote upgrade their equipment, without taking a good look at changelog,
> > will loose access to their devices with no means of remotely fixing that.
> 
> It seems to me that the mistake here is seeing multiple modes of
> operations for the cpsw. There are not actually many, there is one
> usage, and then there is what you can and cannot offload. 
CPSW has in fact 2 modes of operation, different FIFO usage/lookup entry(it's
called ALE in the current driver) by-pass(which is used in dual emac for 
example) and other features. Again Grygorii is better suited to answer the 
exact differences.
> The basic
> premise with switchdev and DSA (which uses switchdev) is that each
> user-facing port of your switch needs to work as if it were a normal
> Ethernet NIC, that is what you call dual-MAC I believe. Then, when you
> create a bridge and you enslave those ports into the bridge, you need to
> have forwarding done in hardware between these two ports when the
> SMAC/DMAC are not for the host/CPU/management interface and you must
> simultaneously still have the host have the ability to send/receive
> traffic through the bridge device.
Yes dual emac does that. But dual emac configures the port facing VLAN to the
CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU port
That's exactly what the current RFC does as well, with the addition of
registering a sw0p0 (i'll explain why later on this mail)
A little more detail on the issue we are having. On my description 
sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the ports
that have PHYs attached. 

When we start in the new switchdev mode all interfaces are added to VLAN 0
so CPU port + port1 + port2 are all in the same VLAN group. In that case sw0p1
and sw0p2 are working as you describe. So those 2 interfaces can send/receive
traffic normally which matches the switchdev case.

When we add them on a bridge(let's say br0), VLAN1(or any default bridge VLAN)
is now configured on sw0p1 and sw0p2 but *not* on the CPU port. 
>From this point on the whole fuunctionality just collapses. The switch will 
work and offload traffic between sw0p1/sw0p2 but the bridge won't be able to 
get an ip address (since VLAN1 is not a member of the CPU port and the packet 
gets dropped). 
IGMPv2/V3 messages will never reach the br_multicast.c code to trigger 
switchdev and configure the MDBs on the ports.  i am prety sure there are other
fail scenarios which i haven't discovered already, but those 2 are the most 
basic ones.  If we add VLAN1 on the CPU port, everything works as intended of 
course.

That's the reason we registered sw0p0 as the CPU port. It can't do any "real"
traffic, but you can configure the CPU port independantly and not be forced to
do an OR on every VLAN add/delete grouping the CPU port with your port command.
The TL;DR version of this is that the switch is working exactly as switchdev is
expecting offloading traffic to the hardware when possible as long as the CPU
port is member of the proper VLANs

Petr's patch solves this for us (9c86ce2c1ae337fc10568a12aea812ed03de8319).
We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and decide
when to add the CPU port or not. 

There are still a couple of cases that are not covered though, if we don't 
register the CPU port. We cant decide when to forward multicast
traffic on the CPU port if a join hasn't been sent from br0.
So let's say you got 2 hosts doing multicast and for whatever reason the host
wants to see that traffic. 
With the CPU port present you can do a 
"bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will offload
the traffic to the CPU port and thus the host. If this goes away we are forced
to send a join.
 
> It seems to me like this is entirely doable given that the dual MAC use
> case is supported already.
> 
> switchdev is just a stateless framework to get notified from the
> networking stack about what you can possibly offload in hardware, so
> having a DTS option gate that is unfortunately wrong because it is
> really implementing a SW policy in DTS which is not what it is meant for.
The DTS option for configuration pre-existed, i don't like that either switchdev
mode is activated by a .config option not DTS(it just overrides whatever config 
you have on the DTS). Far from pretty though fair point, i am open to 
suggestions.
> -- 
> Florian

Thanks!
Ilias