netdev - Re: [RFC PATCH net-next 0/2] net: tc: dsa: Implement offload of matchall for bridged DSA ports

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <744ca0a6-95a3-cc81-5b09-ff417ffde401@gmail.com>
Date:   Thu, 31 Mar 2022 10:06:20 +0200
From:   Mattias Forsblad <mattias.forsblad@...il.com>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Tobias Waldekranz <tobias@...dekranz.com>
Subject: Re: [RFC PATCH net-next 0/2] net: tc: dsa: Implement offload of
 matchall for bridged DSA ports

On 2022-03-30 14:09, Vladimir Oltean wrote:
> On Wed, Mar 30, 2022 at 01:31:14PM +0200, Mattias Forsblad wrote:
>> Greetings,
>>
>> This series implements offloading of tc matchall filter to HW
>> for bridged DSA ports.
>>
>> Background
>> When using a non-VLAN filtering bridge we want to be able to drop
>> traffic directed to the CPU port so that the CPU doesn't get unnecessary loaded.
>> This is specially important when we have disabled learning on user ports.
>>
>> A sample configuration could be something like this:
>>
>>        br0
>>       /   \
>>    swp0   swp1
>>
>> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
>> ip link set swp0 master br0
>> ip link set swp1 master br0
>> ip link set swp0 type bridge_slave learning off
>> ip link set swp1 type bridge_slave learning off
>> ip link set swp0 up
>> ip link set swp1 up
>> ip link set br0 up
>>
>> After discussions here: https://lore.kernel.org/netdev/YjMo9xyoycXgSWXS@shredder/
>> it was advised to use tc to set an ingress filter that could then
>> be offloaded to HW, like so:
>>
>> tc qdisc add dev br0 clsact
>> tc filter add dev br0 ingress pref 1 proto all matchall action drop
>>
>> Limitations
>> If there is tc rules on a bridge and all the ports leave the bridge
>> and then joins the bridge again, the indirect framwork doesn't seem
>> to reoffload them at join. The tc rules need to be torn down and
>> re-added.
>>
>> The first part of this serie uses the flow indirect framework to
>> setup monitoring of tc qdisc and filters added to a bridge.
>> The second part offloads the matchall filter to HW for Marvell
>> switches.
>>
>> Mattias Forsblad (2):
>>   net: tc: dsa: Add the matchall filter with drop action for bridged DSA ports.
>>   net: dsa: Implement tc offloading for drop target.
>>
>>  drivers/net/dsa/mv88e6xxx/chip.c |  23 +++-
>>  include/net/dsa.h                |  13 ++
>>  net/dsa/dsa2.c                   |   5 +
>>  net/dsa/dsa_priv.h               |   3 +
>>  net/dsa/port.c                   |   1 +
>>  net/dsa/slave.c                  | 217 ++++++++++++++++++++++++++++++-
>>  6 files changed, 258 insertions(+), 4 deletions(-)
>>
>> -- 
>> 2.25.1
>>
> 
> Have you considered point b of my argument here?
> https://patchwork.kernel.org/project/netdevbpf/patch/20220317065031.3830481-5-mattias.forsblad@gmail.com/#24782383
> 
> To make that argument even clearer, the following script produces:
> 
> #!/bin/bash
> 
> ip netns add ns0
> ip netns add ns1
> ip link add veth0 type veth peer name veth1
> ip link add veth2 type veth peer name veth3
> ip link add br0 type bridge && ip link set br0 up
> ip link set veth0 netns ns0
> ip link set veth3 netns ns1
> ip -n ns0 addr add 192.168.100.1/24 dev veth0 && ip -n ns0 link set veth0 up
> ip -n ns1 addr add 192.168.100.2/24 dev veth3 && ip -n ns1 link set veth3 up
> ip addr add 192.168.100.3/24 dev br0
> ip link set veth1 master br0 && ip link set veth1 up
> ip link set veth2 master br0 && ip link set veth2 up
> tc qdisc add dev br0 clsact
> tc filter add dev br0 ingress matchall action drop
> echo "Pinging another bridge port" && ip netns exec ns0 ping -c 3 192.168.100.2
> echo "Pinging the bridge" && ip netns exec ns0 ping -c 3 192.168.100.3
> ip netns del ns0
> ip netns del ns1
> ip link del br0
> 
> [ 1857.000393] br0: port 1(veth1) entered blocking state
> [ 1857.005537] br0: port 1(veth1) entered disabled state
> [ 1857.011433] device veth1 entered promiscuous mode
> [ 1857.047291] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> [ 1857.054019] br0: port 1(veth1) entered blocking state
> [ 1857.059205] br0: port 1(veth1) entered forwarding state
> [ 1857.064791] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 1857.124507] br0: port 2(veth2) entered blocking state
> [ 1857.129658] br0: port 2(veth2) entered disabled state
> [ 1857.135585] device veth2 entered promiscuous mode
> [ 1857.209748] br0: port 2(veth2) entered blocking state
> [ 1857.214900] br0: port 2(veth2) entered forwarding state
> [ 1857.220680] IPv6: ADDRCONF(NETDEV_CHANGE): veth3: link becomes ready
> Pinging another bridge port
> PING 192.168.100.2 (192.168.100.2) 56(84) bytes of data.
> 64 bytes from 192.168.100.2: icmp_seq=1 ttl=64 time=0.508 ms
> 64 bytes from 192.168.100.2: icmp_seq=2 ttl=64 time=0.222 ms
> 64 bytes from 192.168.100.2: icmp_seq=3 ttl=64 time=0.405 ms
> 
> --- 192.168.100.2 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2051ms
> rtt min/avg/max/mdev = 0.222/0.378/0.508/0.118 ms
> Pinging the bridge
> PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data.
> ^C
> --- 192.168.100.3 ping statistics ---
> 3 packets transmitted, 0 received, 100% packet loss, time 2040ms
> 
> filter protocol all pref 49152 matchall chain 0
> filter protocol all pref 49152 matchall chain 0 handle 0x1
>   not_in_hw
>         action order 1: gact action drop
>          random type none pass val 0
>          index 1 ref 1 bind 1 installed 12 sec used 6 sec
>         Action statistics:
>         Sent 936 bytes 16 pkt (dropped 16, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
> 
> [ 1870.189158] br0: port 1(veth1) entered disabled state
> [ 1870.204061] device veth1 left promiscuous mode
> [ 1870.208751] br0: port 1(veth1) entered disabled state
> [ 1870.232677] device veth2 left promiscuous mode
> [ 1870.237814] br0: port 2(veth2) entered disabled state
> 
> Now imagine that veth0 is a DSA switch interface which monitors and
> offloads the drop rule. How could it distinguish between pinging veth3
> and pinging br0, so as to comply with software semantics?

Hi Vladimir,
thanks for your comments. The patch series takes in account that a foreign
interface is bridged and doesn't offload the rule in this case (dsa_slave_check_offload).

Regarding your previous comment point b. Tobias could see some problems
with that approach. I'd think he will comment on that.