netdev - Re: [PATCH 4/4] cpsw: add switchdev support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e5832a2c-da63-d91d-e93a-51b7e84959a0@ti.com>
Date:   Tue, 5 Jun 2018 16:03:38 -0500
From:   Grygorii Strashko <grygorii.strashko@...com>
To:     Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Florian Fainelli <f.fainelli@...il.com>
CC:     Andrew Lunn <andrew@...n.ch>, <netdev@...r.kernel.org>,
        <ivan.khoronzhuk@...aro.org>, <nsekhar@...com>, <jiri@...nulli.us>,
        <ivecera@...hat.com>, <francois.ozog@...aro.org>, <yogeshs@...com>,
        <spatton@...com>
Subject: Re: [PATCH 4/4] cpsw: add switchdev support



On 06/02/2018 05:34 AM, Ilias Apalodimas wrote:
> Hi Florian,
> 
> Thanks for taking time to look into this
> 
> On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
>>
>>
>> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
>>> On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
>>>> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
>>>>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
>>>>>> Device tree is supposed to describe the hardware. Using that hardware
>>>>>> in different ways is not something you should describe in DT.
>>>>>>
>>>>> The new switchdev mode is applied with a .config option in the kernel. What you
>>>>> see is pre-existing code, so i am not sure if i should change it in this
>>>>> patchset.
>>>>
>>>> If you break the code up into a library and two drivers, it becomes a
>>>> moot point.
>>> Agree
>>>
>>>>
>>>> But what i don't like here is that the device tree says to do dual
>>>> mac. But you ignore that and do sometime else. I would prefer that if
>>>> DT says dual mac, and switchdev is compiled in, the probe fails with
>>>> EINVAL. Rather than ignore something, make it clear it is invalid.
>>> The switch has 3 modes of operation as is.
>>> 1. switch mode, to enable that you don't need to add anything on
>>> the DTS and linux registers a single netdev interface.
>>> 2. dual mac mode, this is when you need to add dual_emac; on the DTS.
>>> 3. switchdev mode which is controlled by a .config option, since as you
>>> pointed out DTS was not made for controlling config options.
>>>
>>> I agree that this is far from beautiful. If the driver remains as in though,
>>> i'd prefer either keeping what's there or making "switchdev" a DTS option,
>>> following the pre-existing erroneous usage rather than making the device
>>> unusable.  If we end up returning some error and refuse to initialize, users
>>> that remote upgrade their equipment, without taking a good look at changelog,
>>> will loose access to their devices with no means of remotely fixing that.
>>
>> It seems to me that the mistake here is seeing multiple modes of
>> operations for the cpsw. There are not actually many, there is one
>> usage, and then there is what you can and cannot offload.

> CPSW has in fact 2 modes of operation, different FIFO usage/lookup entry(it's
> called ALE in the current driver) by-pass(which is used in dual emac for
> example) and other features. Again Grygorii is better suited to answer the
> exact differences.

dual_mac is HW enabled mode of operation, so having DT option is pretty
reasonable as for me. 
1) when enabled it configures internal FIFOs in special way so both
external Ports became equal in the direction toward to Host port 0.

TRM "The intention of this mode is to allow packets from both ethernet ports
to be written into the FIFO without one port starving the other port."

2) ALE, out of the box, does not support this mode and, as result, two
default vlan have to be created to direct traffic P1->P0 (vlan1) and
P2-P0 (vlan2), but any packets P0->P1 and P0->P2 have to be directed
(and will bypass ALE). This way traffic separated on cpsw egress towards to P0, 

>> The basic
>> premise with switchdev and DSA (which uses switchdev) is that each
>> user-facing port of your switch needs to work as if it were a normal
>> Ethernet NIC, that is what you call dual-MAC I believe. Then, when you
>> create a bridge and you enslave those ports into the bridge, you need to
>> have forwarding done in hardware between these two ports when the
>> SMAC/DMAC are not for the host/CPU/management interface and you must
>> simultaneously still have the host have the ability to send/receive
>> traffic through the bridge device.

TRM
"When operating in dual mac mode the intention is to transfer packets between ports 0 and 1 and ports 0
and 2, but not between ports 1 and 2. Each CPGMAC_SL appears as a single MAC with no bridging
between MAC’s. Each CPGMAC_SL has at least one unique (not the same) mac address."

So, we cant't talk about forwarding in dual_mac mode. Interfaces expected to be 
completely independent without any packet leaking between interfaces.

!! BUT !! CPSW ALE (switch core) still expected to be used, but for packet filtering
- only registered vlans
- only registered mcast/bcast
- ingress mcast/bcast rate limiting (it's actually more like coalescing -
 limits number of mcast/bcast packets per sec. 

And all offloading ALE (val/mdb) entries should always contain two ports in masks:
p1&p0 or p2&p0. Never ever all three ports. 

> Yes dual emac does that. But dual emac configures the port facing VLAN to the
> CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
> configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU port
> That's exactly what the current RFC does as well, with the addition of
> registering a sw0p0 (i'll explain why later on this mail)
> A little more detail on the issue we are having. On my description
> sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the ports
> that have PHYs attached.
> 
> When we start in the new switchdev mode all interfaces are added to VLAN 0
> so CPU port + port1 + port2 are all in the same VLAN group. In that case sw0p1
> and sw0p2 are working as you describe. So those 2 interfaces can send/receive
> traffic normally which matches the switchdev case.
> 
> When we add them on a bridge(let's say br0), VLAN1(or any default bridge VLAN)
> is now configured on sw0p1 and sw0p2 but *not* on the CPU port.
>  From this point on the whole fuunctionality just collapses. The switch will
> work and offload traffic between sw0p1/sw0p2 but the bridge won't be able to
> get an ip address (since VLAN1 is not a member of the CPU port and the packet
> gets dropped).
> IGMPv2/V3 messages will never reach the br_multicast.c code to trigger
> switchdev and configure the MDBs on the ports.  i am prety sure there are other
> fail scenarios which i haven't discovered already, but those 2 are the most
> basic ones.  If we add VLAN1 on the CPU port, everything works as intended of
> course.
> 
> That's the reason we registered sw0p0 as the CPU port. It can't do any "real"
> traffic, but you can configure the CPU port independantly and not be forced to
> do an OR on every VLAN add/delete grouping the CPU port with your port command.
> The TL;DR version of this is that the switch is working exactly as switchdev is
> expecting offloading traffic to the hardware when possible as long as the CPU
> port is member of the proper VLANs
> 
> Petr's patch solves this for us (9c86ce2c1ae337fc10568a12aea812ed03de8319).
> We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and decide
> when to add the CPU port or not.
> 
> There are still a couple of cases that are not covered though, if we don't
> register the CPU port. We cant decide when to forward multicast
> traffic on the CPU port if a join hasn't been sent from br0.
> So let's say you got 2 hosts doing multicast and for whatever reason the host
> wants to see that traffic.
> With the CPU port present you can do a
> "bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will offload
> the traffic to the CPU port and thus the host. If this goes away we are forced
> to send a join.
>   
>> It seems to me like this is entirely doable given that the dual MAC use
>> case is supported already.
>>
>> switchdev is just a stateless framework to get notified from the
>> networking stack about what you can possibly offload in hardware, so
>> having a DTS option gate that is unfortunately wrong because it is
>> really implementing a SW policy in DTS which is not what it is meant for.
> The DTS option for configuration pre-existed, i don't like that either switchdev
> mode is activated by a .config option not DTS(it just overrides whatever config
> you have on the DTS). Far from pretty though fair point, i am open to
> suggestions.

Again this is option describing HW mode which not expected to be changed on the fly.
I'm honestly, propose descope dual_mac and concentrate on switch mode (switchdev) - 
right now I don't see how dual_mac can be supported with switchdev as per above.
The same way as I do not see how we can re-use switchdev with 50% of devices which 
have "only one" user-facing external port (P1 or P2).


-- 
regards,
-grygorii