netdev - Re: [RFC PATCH net-next 0/2] net: tc: dsa: Implement offload of matchall for bridged DSA ports

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220330120919.ibmwui3jwfexjes4@skbuf>
Date:   Wed, 30 Mar 2022 15:09:19 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Mattias Forsblad <mattias.forsblad@...il.com>
Cc:     netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Tobias Waldekranz <tobias@...dekranz.com>
Subject: Re: [RFC PATCH net-next 0/2] net: tc: dsa: Implement offload of
 matchall for bridged DSA ports

On Wed, Mar 30, 2022 at 01:31:14PM +0200, Mattias Forsblad wrote:
> Greetings,
> 
> This series implements offloading of tc matchall filter to HW
> for bridged DSA ports.
> 
> Background
> When using a non-VLAN filtering bridge we want to be able to drop
> traffic directed to the CPU port so that the CPU doesn't get unnecessary loaded.
> This is specially important when we have disabled learning on user ports.
> 
> A sample configuration could be something like this:
> 
>        br0
>       /   \
>    swp0   swp1
> 
> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
> ip link set swp0 master br0
> ip link set swp1 master br0
> ip link set swp0 type bridge_slave learning off
> ip link set swp1 type bridge_slave learning off
> ip link set swp0 up
> ip link set swp1 up
> ip link set br0 up
> 
> After discussions here: https://lore.kernel.org/netdev/YjMo9xyoycXgSWXS@shredder/
> it was advised to use tc to set an ingress filter that could then
> be offloaded to HW, like so:
> 
> tc qdisc add dev br0 clsact
> tc filter add dev br0 ingress pref 1 proto all matchall action drop
> 
> Limitations
> If there is tc rules on a bridge and all the ports leave the bridge
> and then joins the bridge again, the indirect framwork doesn't seem
> to reoffload them at join. The tc rules need to be torn down and
> re-added.
> 
> The first part of this serie uses the flow indirect framework to
> setup monitoring of tc qdisc and filters added to a bridge.
> The second part offloads the matchall filter to HW for Marvell
> switches.
> 
> Mattias Forsblad (2):
>   net: tc: dsa: Add the matchall filter with drop action for bridged DSA ports.
>   net: dsa: Implement tc offloading for drop target.
> 
>  drivers/net/dsa/mv88e6xxx/chip.c |  23 +++-
>  include/net/dsa.h                |  13 ++
>  net/dsa/dsa2.c                   |   5 +
>  net/dsa/dsa_priv.h               |   3 +
>  net/dsa/port.c                   |   1 +
>  net/dsa/slave.c                  | 217 ++++++++++++++++++++++++++++++-
>  6 files changed, 258 insertions(+), 4 deletions(-)
> 
> -- 
> 2.25.1
> 

Have you considered point b of my argument here?
https://patchwork.kernel.org/project/netdevbpf/patch/20220317065031.3830481-5-mattias.forsblad@gmail.com/#24782383

To make that argument even clearer, the following script produces:

#!/bin/bash

ip netns add ns0
ip netns add ns1
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
ip link add br0 type bridge && ip link set br0 up
ip link set veth0 netns ns0
ip link set veth3 netns ns1
ip -n ns0 addr add 192.168.100.1/24 dev veth0 && ip -n ns0 link set veth0 up
ip -n ns1 addr add 192.168.100.2/24 dev veth3 && ip -n ns1 link set veth3 up
ip addr add 192.168.100.3/24 dev br0
ip link set veth1 master br0 && ip link set veth1 up
ip link set veth2 master br0 && ip link set veth2 up
tc qdisc add dev br0 clsact
tc filter add dev br0 ingress matchall action drop
echo "Pinging another bridge port" && ip netns exec ns0 ping -c 3 192.168.100.2
echo "Pinging the bridge" && ip netns exec ns0 ping -c 3 192.168.100.3
ip netns del ns0
ip netns del ns1
ip link del br0

[ 1857.000393] br0: port 1(veth1) entered blocking state
[ 1857.005537] br0: port 1(veth1) entered disabled state
[ 1857.011433] device veth1 entered promiscuous mode
[ 1857.047291] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
[ 1857.054019] br0: port 1(veth1) entered blocking state
[ 1857.059205] br0: port 1(veth1) entered forwarding state
[ 1857.064791] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 1857.124507] br0: port 2(veth2) entered blocking state
[ 1857.129658] br0: port 2(veth2) entered disabled state
[ 1857.135585] device veth2 entered promiscuous mode
[ 1857.209748] br0: port 2(veth2) entered blocking state
[ 1857.214900] br0: port 2(veth2) entered forwarding state
[ 1857.220680] IPv6: ADDRCONF(NETDEV_CHANGE): veth3: link becomes ready
Pinging another bridge port
PING 192.168.100.2 (192.168.100.2) 56(84) bytes of data.
64 bytes from 192.168.100.2: icmp_seq=1 ttl=64 time=0.508 ms
64 bytes from 192.168.100.2: icmp_seq=2 ttl=64 time=0.222 ms
64 bytes from 192.168.100.2: icmp_seq=3 ttl=64 time=0.405 ms

--- 192.168.100.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2051ms
rtt min/avg/max/mdev = 0.222/0.378/0.508/0.118 ms
Pinging the bridge
PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data.
^C
--- 192.168.100.3 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2040ms

filter protocol all pref 49152 matchall chain 0
filter protocol all pref 49152 matchall chain 0 handle 0x1
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 12 sec used 6 sec
        Action statistics:
        Sent 936 bytes 16 pkt (dropped 16, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

[ 1870.189158] br0: port 1(veth1) entered disabled state
[ 1870.204061] device veth1 left promiscuous mode
[ 1870.208751] br0: port 1(veth1) entered disabled state
[ 1870.232677] device veth2 left promiscuous mode
[ 1870.237814] br0: port 2(veth2) entered disabled state

Now imagine that veth0 is a DSA switch interface which monitors and
offloads the drop rule. How could it distinguish between pinging veth3
and pinging br0, so as to comply with software semantics?