netdev - Re: [PATCH 4/4] cpsw: add switchdev support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <A6381ECC-4C7C-4CCB-88B0-9FC77FE18F66@gmail.com>
Date:   Sat, 02 Jun 2018 09:10:08 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Ilias Apalodimas <ilias.apalodimas@...aro.org>
CC:     Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
        grygorii.strashko@...com, ivan.khoronzhuk@...aro.org,
        nsekhar@...com, jiri@...nulli.us, ivecera@...hat.com,
        francois.ozog@...aro.org, yogeshs@...com, spatton@...com
Subject: Re: [PATCH 4/4] cpsw: add switchdev support

On June 2, 2018 3:34:32 AM MST, Ilias Apalodimas <ilias.apalodimas@...aro.org> wrote:
>Hi Florian, 
>
>Thanks for taking time to look into this
>
>On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
>> 
>> 
>> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
>> > On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
>> >> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
>> >>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
>> >>>> Device tree is supposed to describe the hardware. Using that
>hardware
>> >>>> in different ways is not something you should describe in DT.
>> >>>>
>> >>> The new switchdev mode is applied with a .config option in the
>kernel. What you
>> >>> see is pre-existing code, so i am not sure if i should change it
>in this
>> >>> patchset.
>> >>
>> >> If you break the code up into a library and two drivers, it
>becomes a
>> >> moot point.
>> > Agree
>> > 
>> >>
>> >> But what i don't like here is that the device tree says to do dual
>> >> mac. But you ignore that and do sometime else. I would prefer that
>if
>> >> DT says dual mac, and switchdev is compiled in, the probe fails
>with
>> >> EINVAL. Rather than ignore something, make it clear it is invalid.
>> > The switch has 3 modes of operation as is.
>> > 1. switch mode, to enable that you don't need to add anything on
>> > the DTS and linux registers a single netdev interface.
>> > 2. dual mac mode, this is when you need to add dual_emac; on the
>DTS.
>> > 3. switchdev mode which is controlled by a .config option, since as
>you 
>> > pointed out DTS was not made for controlling config options. 
>> > 
>> > I agree that this is far from beautiful. If the driver remains as
>in though,
>> > i'd prefer either keeping what's there or making "switchdev" a DTS
>option, 
>> > following the pre-existing erroneous usage rather than making the
>device 
>> > unusable.  If we end up returning some error and refuse to
>initialize, users 
>> > that remote upgrade their equipment, without taking a good look at
>changelog,
>> > will loose access to their devices with no means of remotely fixing
>that.
>> 
>> It seems to me that the mistake here is seeing multiple modes of
>> operations for the cpsw. There are not actually many, there is one
>> usage, and then there is what you can and cannot offload. 
>CPSW has in fact 2 modes of operation, different FIFO usage/lookup
>entry(it's
>called ALE in the current driver) by-pass(which is used in dual emac
>for 
>example) and other features. Again Grygorii is better suited to answer
>the 
>exact differences.
>> The basic
>> premise with switchdev and DSA (which uses switchdev) is that each
>> user-facing port of your switch needs to work as if it were a normal
>> Ethernet NIC, that is what you call dual-MAC I believe. Then, when
>you
>> create a bridge and you enslave those ports into the bridge, you need
>to
>> have forwarding done in hardware between these two ports when the
>> SMAC/DMAC are not for the host/CPU/management interface and you must
>> simultaneously still have the host have the ability to send/receive
>> traffic through the bridge device.
>Yes dual emac does that. But dual emac configures the port facing VLAN
>to the
>CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
>configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU
>port
>That's exactly what the current RFC does as well, with the addition of
>registering a sw0p0 (i'll explain why later on this mail)
>A little more detail on the issue we are having. On my description 
>sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the
>ports
>that have PHYs attached. 
>
>When we start in the new switchdev mode all interfaces are added to
>VLAN 0
>so CPU port + port1 + port2 are all in the same VLAN group. In that
>case sw0p1
>and sw0p2 are working as you describe. So those 2 interfaces can
>send/receive
>traffic normally which matches the switchdev case.
>
>When we add them on a bridge(let's say br0), VLAN1(or any default
>bridge VLAN)
>is now configured on sw0p1 and sw0p2 but *not* on the CPU port. 
>>From this point on the whole fuunctionality just collapses. The switch
>will 
>work and offload traffic between sw0p1/sw0p2 but the bridge won't be
>able to 
>get an ip address (since VLAN1 is not a member of the CPU port and the
>packet 
>gets dropped). 
>IGMPv2/V3 messages will never reach the br_multicast.c code to trigger 
>switchdev and configure the MDBs on the ports.  i am prety sure there
>are other
>fail scenarios which i haven't discovered already, but those 2 are the
>most 
>basic ones.  If we add VLAN1 on the CPU port, everything works as
>intended of 
>course.
>
>That's the reason we registered sw0p0 as the CPU port. It can't do any
>"real"
>traffic, but you can configure the CPU port independantly and not be
>forced to
>do an OR on every VLAN add/delete grouping the CPU port with your port
>command.
>The TL;DR version of this is that the switch is working exactly as
>switchdev is
>expecting offloading traffic to the hardware when possible as long as
>the CPU
>port is member of the proper VLANs
>
>Petr's patch solves this for us
>(9c86ce2c1ae337fc10568a12aea812ed03de8319).
>We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and
>decide
>when to add the CPU port or not. 
>
>There are still a couple of cases that are not covered though, if we
>don't 
>register the CPU port. We cant decide when to forward multicast
>traffic on the CPU port if a join hasn't been sent from br0.
>So let's say you got 2 hosts doing multicast and for whatever reason
>the host
>wants to see that traffic. 
>With the CPU port present you can do a 
>"bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will
>offload
>the traffic to the CPU port and thus the host. If this goes away we are
>forced
>to send a join.

Thanks for the detailed explanation. Somehow I was under the impression that cpsw had the ability, through specific DMA descriptor bits to direct traffic towards one external port or another and conversely, have that information from the HW when receiving packets. What you describe is exactly the same problem we have in DSA when the switch advertises DSA_TAG_PROTO_NONE where only VLAN tags could help differentiate traffic from external ports. At some point there was a discuss of making DSA_TAG_PROTO_NONE automatically create one VLAN per port but this is a good source for other problems...

Looking forward to your follow-up patch series!

-- 
Florian