netdev - Re: [PATCH v2 net-next 2/4] net: dsa: Link aggregation support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201201222341.6lywc5qmcpafmp4u@skbuf>
Date:   Wed, 2 Dec 2020 00:23:41 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     Tobias Waldekranz <tobias@...dekranz.com>
Cc:     davem@...emloft.net, kuba@...nel.org, andrew@...n.ch,
        vivien.didelot@...il.com, f.fainelli@...il.com,
        j.vosburgh@...il.com, vfalico@...il.com, andy@...yhouse.net,
        netdev@...r.kernel.org
Subject: Re: [PATCH v2 net-next 2/4] net: dsa: Link aggregation support

On Tue, Dec 01, 2020 at 10:48:34PM +0100, Tobias Waldekranz wrote:
> On Tue, Dec 01, 2020 at 22:04, Vladimir Oltean <olteanv@...il.com> wrote:
> > On Tue, Dec 01, 2020 at 03:29:53PM +0100, Tobias Waldekranz wrote:
> >> On Tue, Dec 01, 2020 at 16:03, Vladimir Oltean <olteanv@...il.com> wrote:
> >> > On Mon, Nov 30, 2020 at 03:06:08PM +0100, Tobias Waldekranz wrote:
> >> >> When a LAG joins a bridge, the DSA subsystem will treat that as each
> >> >> individual port joining the bridge. The driver may look at the port's
> >> >> LAG pointer to see if it is associated with any LAG, if that is
> >> >> required. This is analogue to how switchdev events are replicated out
> >> >> to all lower devices when reaching e.g. a LAG.
> >> >
> >> > Agree with the principle. But doesn't that mean that this code:
> >> >
> >> > static int dsa_slave_switchdev_blocking_event(struct notifier_block *unused,
> >> > 					      unsigned long event, void *ptr)
> >> > {
> >> > 	struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
> >> > 	int err;
> >> >
> >> > 	switch (event) {
> >> > 	case SWITCHDEV_PORT_OBJ_ADD:
> >> > 		err = switchdev_handle_port_obj_add(dev, ptr,
> >> > 						    dsa_slave_dev_check,
> >> > 						    dsa_slave_port_obj_add);
> >> > 		return notifier_from_errno(err);
> >> > 	case SWITCHDEV_PORT_OBJ_DEL:
> >> > 		err = switchdev_handle_port_obj_del(dev, ptr,
> >> > 						    dsa_slave_dev_check,
> >> > 						    dsa_slave_port_obj_del);
> >> > 		return notifier_from_errno(err);
> >> > 	case SWITCHDEV_PORT_ATTR_SET:
> >> > 		err = switchdev_handle_port_attr_set(dev, ptr,
> >> > 						     dsa_slave_dev_check,
> >> > 						     dsa_slave_port_attr_set);
> >> > 		return notifier_from_errno(err);
> >> > 	}
> >> >
> >> > 	return NOTIFY_DONE;
> >> > }
> >> >
> >> > should be replaced with something that also reacts to the case where
> >> > "dev" is a LAG? Like, for example, I imagine that a VLAN installed on a
> >> > bridge port that is a LAG should be propagated to the switch ports
> >> > beneath that LAG. Similarly for all bridge attributes.
> >>
> >> That is exactly what switchdev_handle_* does, no? It is this exact
> >> behavior that my statement about switchdev event replication references.
> >
> > I'm sorry, I don't mean to be overly obtuse, but _how_ does the current
> > code propagate a VLAN to a physical port located below a bond? Through
> > magic? The dsa_slave_dev_check is passed as a parameter to
> > switchdev_handle_port_obj_add _exactly_ because the code has needed so
> > far to match only on DSA interfaces and not on bonding interfaces. So
> > the code does not react to VLANs added on a bonding interface. Hence my
> > question.
>
> There is no magic involved, here is the relevant snippet from
> __switchdev_handle_port_obj_add:
>
> 	/* Switch ports might be stacked under e.g. a LAG. Ignore the
> 	 * unsupported devices, another driver might be able to handle them. But
> 	 * propagate to the callers any hard errors.
> 	 *
> 	 * If the driver does its own bookkeeping of stacked ports, it's not
> 	 * necessary to go through this helper.
> 	 */
> 	netdev_for_each_lower_dev(dev, lower_dev, iter) {
> 		if (netif_is_bridge_master(lower_dev))
> 			continue;
>
> 		err = __switchdev_handle_port_obj_add(lower_dev, port_obj_info,
> 						      check_cb, add_cb);
> 		if (err && err != -EOPNOTSUPP)
> 			return err;
> 	}
>

Oh wow, such an odd place to put that. Especially since the entire
reason why switchdev uses notifiers is that you as a switchdev driver
can now explicitly intercept and offload switchdev objects that the
bridge emitted towards a driver that was "not you", such as a vxlan
interface. I guess that's still what's happening now, just that it's
completely non-obvious since it's hidden behind an opaque function.

Very interesting, thanks, I didn't know that.

> > ip link del bond0
> > ip link add bond0 type bond mode 802.3ad
> > ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
> > ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
> > ip link del br0
> > ip link add br0 type bridge
> > ip link set bond0 master br0
> > ip link set swp0 master br0
> >
> > This should propagate the VLANs to swp1 and swp2 but doesn't:
> > bridge vlan add dev bond0 vid 100
>
> I ran through this on my setup and it is indeed propagated to all ports.
>
> Just a thought, when you rebased the ocelot specific stuff to v2, did
> you add the number of supported LAGs to ds->num_lags? If not, DSA will
> assume that the hardware does not support offloading.

Ah, yes, that makes sense and that's what was happening. So DSA does the
right thing and does not offload bridge attributes to these ports,
because bonding needs to be done in software, and therefore even
bridging on swp1 and swp2 needs to be done in software. So as far as DSA
is concerned, swp1 and swp2 are standalone ports. This reminds me that I
need to do more testing for switches that can't offload bonding, to make
sure that they do the right thing.