linux-kernel - Re: [PATCH v2 14/14] net: ethernet: mtk_eth_soc: support creating mac address based offload entries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2989e566-a1d2-2288-8ef3-759f20aa0c2e@nbd.name>
Date:   Tue, 12 Apr 2022 19:51:52 +0200
From:   Felix Fietkau <nbd@....name>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     netdev@...r.kernel.org, John Crispin <john@...ozen.org>,
        Sean Wang <sean.wang@...iatek.com>,
        Mark Lee <Mark-MC.Lee@...iatek.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Matthias Brugger <matthias.bgg@...il.com>,
        linux-arm-kernel@...ts.infradead.org,
        linux-mediatek@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Jiri Pirko <jiri@...dia.com>, Ido Schimmel <idosch@...dia.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vladimir Oltean <vladimir.oltean@....com>
Subject: Re: [PATCH v2 14/14] net: ethernet: mtk_eth_soc: support creating mac
 address based offload entries


On 12.04.22 19:37, Andrew Lunn wrote:
>> It basically has to keep track of all possible destination ports, their STP
>> state, all their fdb entries, member VLANs of all ports. It has to quickly
>> react to changes in any of these.
> 
> switchdev gives you all of those i think. DSA does not make use of
> them all, in particularly the fdb entries, because of the low
> bandwidth management link to the switch. But look at the Mellanox
> switch, it keeps its hardware fdb entries in sync with the software
> fdb.
> 
> And you get every quick access to these, sometimes too quick in that
> it is holding a spinlock when it calls the switchdev functions, and
> you need to defer the handling in your driver if you want to use a
> mutex, perform blocking IO etc.
> 
>> In order to implement this properly, I would also need to make more changes
>> to mac80211. Right now, mac80211 drivers do not have access to the
>> net_device pointer of virtual interfaces. So mac80211 itself would likely
>> need to implement the switchdev ops and handle some of this.
> 
> So this again sounds like something which would be shared by IPA, and
> any other hardware which can accelerate forwarding between WiFi and
> some other sort of interface.
I would really like to see an example of how this should be done.
Is there a work in progress tree for IPA with offloading? Because the 
code that I see upstream doesn't seem to have any of that - or did I 
look in the wrong place?

>> There are also some other issues where I don't know how this is supposed to
>> be solved properly:
>> On MT7622 most of the bridge ports are connected to a MT7531 switch using
>> DSA. Offloading (lan->wlan bridging or L3/L4 NAT/routing) is not handled by
>> the switch itself, it is handled by a packet processing engine in the SoC,
>> which knows how to handle the DSA tags of the MT7531 switch.
>> 
>> So if I were to handle this through switchdev implemented on the wlan and
>> ethernet devices, it would technically not be part of the same switch, since
>> it's a behind a different component with a different driver.
> 
> What is important here is the user experience. The user is not
> expected to know there is an accelerate being used. You setup the
> bridge just as normal, using iproute2. You add routes in the normal
> way, either by iproute2, or frr can add routes from OSPF, BGP, RIP or
> whatever, via zebra. I'm not sure anybody has yet accelerated NAT, but
> the same principle should be used, using iptables in the normal way,
> and the accelerate is then informed and should accelerate it if
> possible.
Accelerated NAT on MT7622 is already present in the upstream code for a 
while. It's there for ethernet, and with my patches it also works for 
ethernet -> wlan.

> switchdev gives you notification of when anything changes. You can
> have multiple receivers of these notifications, so the packet
> processor can act on them as well as the DSA switch.
>   
>> Also, is switchdev able to handle the situation where only parts of the
>> traffic is offloaded and the rest (e.g. multicast) is handled through the
>> regular software path?
> 
> Yes, that is not a problem. I deliberately use the term
> accelerator. We accelerate what Linux can already do. If the
> accelerator hardware is not capable of something, Linux still is, so
> just pass it the frames and it will do the right thing. Multicast is a
> good example of this, many of the DSA switch drivers don't accelerate
> it.
Don't get me wrong, I'm not against switchdev support at all. I just 
don't know how to do it yet, and the code that I put in place is useful 
for non-switchdev use cases as well.

>> In my opinion, handling it through the TC offload has a number of
>> advantages:
>> - It's a lot simpler
>> - It uses the same kind of offloading rules that my software fastpath
>> already uses
>> - It allows more fine grained control over which traffic should be offloaded
>> (src mac -> destination MAC tuple)
>> 
>> I also plan on extending my software fast path code to support emulating
>> bridging of WiFi client mode interfaces. This involves doing some MAC
>> address translation with some IP address tracking. I want that to support
>> hardware offload as well.
>> 
>> I really don't think that desire for supporting switchdev based offload
>> should be a blocker for accepting this code now, especially since my
>> implementation relies on existing Linux network APIs without inventing any
>> new ones, and there are valid use cases for using it, even with switchdev
>> support in place.
> 
> What we need to avoid is fragmentation of the way we do things. It has
> been decided that switchdev is how we use accelerators, and the user
> should not really know anything about the accelerator. No other in
> kernel network accelerator needs a user space component listening to
> netlink notifications and programming the accelerator from user space.
> Do we really want two ways to do this?
There's always some overlap in what the APIs can do. And when it comes 
to the "client mode bridge" use case that I mentioned, I would also need 
exactly the same API that I put in place here. And this is not something 
that can (or even should) be done using switchdev. mac80211 prevents 
adding client mode interfaces to bridges for a reason.

- Felix