netdev - Re: Isolating DSA Slave Interfaces on a Bridge with Bridge Offloading

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e427d390-0cd4-d322-1665-4176cd134884@arinc9.com>
Date:   Sun, 10 Apr 2022 16:00:19 +0300
From:   Arınç ÜNAL <arinc.unal@...nc9.com>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        Florian Fainelli <f.fainelli@...il.com>,
        Alvin Šipraga <ALSI@...g-olufsen.dk>,
        Linus Walleij <linus.walleij@...aro.org>,
        Andrew Lunn <andrew@...n.ch>,
        Luiz Angelo Daros de Luca <luizluca@...il.com>,
        DENG Qingfang <dqfext@...il.com>, erkin.bozoglu@...ont.com,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Sean Wang <sean.wang@...iatek.com>,
        Mark Lee <Mark-MC.Lee@...iatek.com>
Subject: Re: Isolating DSA Slave Interfaces on a Bridge with Bridge Offloading

On 14/03/2022 17:00, Vladimir Oltean wrote:
> On Mon, Mar 14, 2022 at 04:13:47PM +0300, Arınç ÜNAL wrote:
>> Hey Vladimir,
>>
>> On 13/03/2022 16:32, Vladimir Oltean wrote:
>>> Hello Arınç,
>>>
>>> On Sun, Mar 13, 2022 at 02:23:47PM +0300, Arınç ÜNAL wrote:
>>>> Hi all,
>>>>
>>>> The company I work with has got a product with a Mediatek MT7530 switch.
>>>> They want to offer isolation for the switch ports on a bridge. I have run a
>>>> test on a slightly modified 5.17-rc1 kernel. These commands below should
>>>> prevent communication between the two interfaces:
>>>>
>>>> bridge link set dev sw0p4 isolated on
>>>>
>>>> bridge link set dev sw0p3 isolated on
>>>>
>>>> However, computers connected to each of these ports can still communicate
>>>> with each other. Bridge TX forwarding offload is implemented on the MT7530
>>>> DSA driver.
>>>>
>>>> What I understand is isolation works on the software and because of the
>>>> bridge offloading feature, the frames never reach the CPU where we can block
>>>> it.
>>>>
>>>> Two solutions I can think of:
>>>>
>>>> - Disable bridge offloading when isolation is enabled on a DSA slave
>>>> interface. Not the best solution but seems easy to implement.
>>>>
>>>> - When isolation is enabled on a DSA slave interface, do not mirror the
>>>> related FDB entries to the switch hardware so we can keep the bridge
>>>> offloading feature for other ports.
>>>>
>>>> I suppose this could only be achieved on switch specific DSA drivers so the
>>>> implementation would differ by each driver.
>>>>
>>>> Cheers.
>>>> Arınç
>>>
>>> To be clear, are you talking about a patched or unpatched upstream
>>> kernel? Because mt7530 doesn't implement bridge TX forwarding offload.
>>> This can be seen because it is missing the "*tx_fwd_offload = true;"
>>> line from its ->port_bridge_join handler (of course, not only that).
>>
>> I'm compiling the kernel using OpenWrt SDK so it's patched but mt7530 DSA
>> driver is untouched. I've seen this commit which made me think of that:
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/drivers/net/dsa/mt7530.c?id=b079922ba2acf072b23d82fa246a0d8de198f0a2
>>
>> I'm very inexperienced in coding so I must've made a wrong assumption.
>>
>> I've tested that mt7530 DSA driver delivers frames between switch ports
>> without the frames appearing on either the slave interfaces or the master
>> interface. I suppose I was confusing this feature with tx_fwd_offload?
>>
>> For example, the current state of the rtl8365mb DSA driver doesn't do this.
>> Each frame goes in and out of the DSA master interface to deliver frames
>> between the switch ports.
> 
> That is just "forwarding offload". It means "for each packet that a
> bridged switch port receives from the outside world, look up the
> hardware FDB and send the packet just to its intended destination. If
> there is no FDB entry, flood the packet in the entire forwarding domain
> and also to the CPU. The packet flooded to the CPU will have the
> skb->offload_fwd_mark bit set to true by the hardware driver to indicate
> to the bridge that flooding to this port's hardware domain has already
> been taken care of. Only flooding towards other (foreign) bridge ports
> needs to be done for this packet."
> 
> TX forwarding offload means "when the bridge wants to send a packet that
> needs to be replicated multiple times (either flooded or plain multicast)
> to swp0, swp1, swp2, swp3 (all in the same hardware forwarding domain),
> avoid replicating it in software (skb_clone) and send it just once to
> the device driver dealing with a port from that hardware domain (swp0).
> The driver will inject that single packet into the hardware in such a
> way that the packet is forwarded based on consulting the hardware FDB.
> If the hardware FDB and the bridge's software FDB are in sync, then
> (a) the packet was flooded by the bridge => the bridge has no FDB entry
>      for it. The hardware FDB has no entry for it either => the hardware
>      floods it too.
> (b) the packet was multicast => the bridge has an MDB entry, and so does
>      the hardware. The packet is forwarded to the ports in the MDB
>      entry."
> 
> This feature is basically the TX equivalent of the basic forwarding
> offload, hence the name. TX packets for which the bridge leaves the FDB
> lookup (and replication) responsibility to hardware have
> skb->offload_fwd_mark set to true, but by the bridge, this time.
> 
> I think it's very important to understand what the TX forwarding offload
> feature was meant to do and how it was meant to operate, because there
> was a discussion with Qingfang on this topic too, and it may be that
> mt7530 switch supports this in a different way.
> https://patchwork.kernel.org/project/netdevbpf/cover/20210703115705.1034112-1-vladimir.oltean@nxp.com/#24290759
> The approach of populating a port mask with more than 1 bit set in the
> tagger is complicated and may not even achieve all the benefits of TX
> forwarding offload. Forcing an FDB/MDB lookup is what is desirable,
> since typically this should also make those packets go through the
> address learning process (they will be treated as data plane packets).
> Address learning on the CPU port is desirable for packets send on behalf
> of a software bridge, and undesirable for packets sent from a standalone
> port.
> If you can enable address learning in the tagger for the packets that
> are sent with skb->offload_fwd_mark == true, then you should be able to
> remove ds->assisted_learning_on_cpu_port and still get the benefit
> described here:
> https://patchwork.kernel.org/project/netdevbpf/patch/20210106095136.224739-7-olteanv@gmail.com/
> 
>>>
>>> You are probably confused as to why is the BR_ISOLATED brport flag is
>>> not rejected by a switchdev interface when it will only lead to
>>> incorrect behavior.
>>>
>>> In fact we've had this discussion with Qingfang, who sent this patch to
>>> allow switchdev drivers to *at the very least* reject the flag, but
>>> didn't want to write up a correct commit description for the change:
>>> https://lore.kernel.org/all/CALW65jbotyW0MSOd-bd1TH_mkiBWhhRCQ29jgn+d12rXdj2pZA@mail.gmail.com/T/
>>> https://patchwork.kernel.org/project/netdevbpf/patch/20210811135247.1703496-1-dqfext@gmail.com/
>>> https://patchwork.kernel.org/project/netdevbpf/patch/20210812142213.2251697-1-dqfext@gmail.com/
>>
>> With v2 applied, when I issue the port isolation command, I get "RTNETLINK
>> answers: Not supported" as it's not implemented in mt7530 yet?
> 
> The entire idea is to get -EINVAL as returned from here, because
> BR_ISOLATED will be present in flags.mask:
> 
> static int
> mt7530_port_pre_bridge_flags(struct dsa_switch *ds, int port,
> 			     struct switchdev_brport_flags flags,
> 			     struct netlink_ext_ack *extack)
> {
> 	if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
> 			   BR_BCAST_FLOOD))
> 		return -EINVAL;
> 
> 	return 0;
> }
> 
> Now, the entire error handling from br_switchdev_set_port_flag() is a
> bit of a disaster.
> https://elixir.bootlin.com/linux/latest/source/net/bridge/br_switchdev.c#L77
> 
> 	/* We run from atomic context here */
> 	err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
> 				       &info.info, extack);
> 	err = notifier_to_errno(err);
> 	if (err == -EOPNOTSUPP) // Drivers need to return -EINVAL here to distinguish themselves from "no one responded to this notifier", which is a non-error.
> 		return 0;
> 
> 	if (err) {
> 		if (extack && !extack->_msg)
> 			NL_SET_ERR_MSG_MOD(extack,
> 					   "bridge flag offload is not supported");
> 		return -EOPNOTSUPP; // However, any other error code passed from the driver, like -EINVAL, is ignored and replaced with -EOPNOTSUPP by the bridge.
> 	}
> 
> I think this is where the "Not supported" error message comes from, in
> your case. It would be good to double-check with prints, though.
> 
> Also, you might want to consider compiling iproute2 with libmnl support,
> to get extack messages from the kernel. Here it seems like you're
> missing "bridge flag offload is not supported", which would have been a
> clear indication that this is the code path that was taken.

I added a pr_info here as it looked easier to test this:

	/* We run from atomic context here */
	err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
				       &info.info, extack);
	err = notifier_to_errno(err);
	if (err == -EOPNOTSUPP)
		return 0;

	if (err) {
		if (extack && !extack->_msg)
			pr_info("Port flag is not supported");
			NL_SET_ERR_MSG_MOD(extack,
					   "bridge flag offload is not supported");
		pr_info("Port flag is not supported");
		return -EOPNOTSUPP;
	}

Looks like this is the expected result:

root@...nWrt:/# bridge link set dev sw0p3 isolated on
[   55.862862] Port flag is not supported
RTNETLINK answers: Not supported
root@...nWrt:/# bridge link set dev sw0p3 isolated on
[   55.862893] Port flag is not supported
[   58.492025] Port flag is not supported
RTNETLINK answers: Not supported

> 
>>> As a result, currently not even the correctness issue has still not yet
>>> been fixed, let alone having any driver act upon the feature correctly.
>>
>> It's unfortunate that this patch was put on hold just because of incomplete
>> commit description. In my eyes, this is a significant fix.
> 
> I agree here. You could send the patch yourself, after making some tests.

Will do.

Arınç