lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240918111008.uzvzkcjg7wfj5foa@skbuf>
Date: Wed, 18 Sep 2024 14:10:08 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: Russell King <linux@...linux.org.uk>, Andrew Lunn <andrew@...n.ch>,
	Florian Fainelli <f.fainelli@...il.com>
Cc: Saravana Kannan <saravanak@...gle.com>, netdev@...r.kernel.org
Subject: Component API not right for DSA?

Hi,

DSA probing in a multi-switch tree is a bit of a hack job.
It relies on the device tree providing complete "routing" information
(the "link" property between one cascade port and the cascade ports of
all other switches reachable through it) to determine whether the tree
has completely probed or not.

A switch which probes but finds that it has a dst->rtable entry (struct
dsa_link) towards another switch which hasn't probed will just report
a successful early probe here:

static int dsa_tree_setup(struct dsa_switch_tree *dst)
{
	...
	complete = dsa_tree_setup_routing_table(dst);
	if (!complete)
		return 0;
	...
}

and will mostly do nothing, except for listing its ports (dsa_port_touch())
in dst->ports.

The dst->ports puzzle becomes complete piece by piece, and when the
final piece of the puzzle probes (last switch of the tree), it sees that
all its link_dp ports are present in dst->ports, so complete=true, and
the code flow actually proceeds further in dsa_tree_setup().

The dsa_tree_setup() runs in the context of the last switch to probe,
and calls ds->ops->setup() etc for all other switches in this tree.

I don't like this for the following reasons:

1. I would like the device tree binding to make the "link" property optional
   (full routing table). Only direct adjacency information is sufficient
   to compute the rest, plus it is much easier on the device tree writer
   (https://lore.kernel.org/netdev/20240913131507.2760966-3-vladimir.oltean@nxp.com/).
   But I would have to keep this workaround functional - otherwise a
   switch will not wait for the entire tree to probe, just for its
   directly connected neighbors. Thus, I would need to run BFS to
   construct a routing table for probe time (_before_ struct dsa_switch
   and struct dsa_port are allocated for the other switches in the tree),
   and then I also need the regular variant of the rtable, that
   dsa_routing_port() uses. Could be done, but not great.

2. I honestly don't think that the workaround to wait until the routing
   table is complete is in the best interest of DSA. The larger context
   here is that one can imagine DSA trees operating in a "degraded state"
   where not all switches are present. For example, if there is a chain
   of 3 switches and the last switch is missing, nothing prevents the
   first 2 from doing their normal job. There is actually a customer who
   wants to take down a switch for regular maintainance, while keeping
   the rest of the system operational. This is currently not possible
   with DSA, because the tree wants all switches to be present. As soon
   as one single switch unbinds, the entire tree is torn down. They are
   pretty serious about this request, not just "would be nice if".
   And of course, not any degraded state makes functional sense. For
   example, removing the top-most switch must result in all switches
   downstream of it also getting removed, because traffic from the CPU
   can no longer reach them. That's fine, and I plan to handle that
   using device links from one switch to its upstream.

For reason #1, I started prototyping an integration between DSA and the
component API - something which was also suggested by Saravana Kannan in
the past. The component master is the tree, and the components are the
switches. For each struct dsa_switch_tree, I instantiated a struct
platform_device and a struct component_master_ops. The tree is created
by the first switch that calls dsa_register_switch(). It has a function
that explores the device tree using BFS, starting from that first switch
that created it, "link" phandle by phandle, calling component_match_add_release()
for the OF node of each other reachable switch. Every other switch in
the tree no longer creates it, but finds it and bumps its refcount.
The tree is bound when all switches have called component_add() with
their struct component_ops.

This is all great, but then I realized that, for addressing issue #2,
it is no better than what we currently have. Namely, by default the tree
looks like this:

$ cat /sys/kernel/debug/device_component/dsa_tree.0.auto
aggregate_device name                                  status
-------------------------------------------------------------
dsa_tree.0.auto                                         bound

device name                                            status
-------------------------------------------------------------
d0032004.mdio-mii:11                                    bound
d0032004.mdio-mii:10                                    bound
d0032004.mdio-mii:12                                    bound

but after this operation:

$ echo d0032004.mdio-mii:11 > /sys/bus/mdio_bus/devices/d0032004.mdio-mii\:11/driver/unbind
$ cat /sys/kernel/debug/device_component/dsa_tree.0.auto
aggregate_device name                                  status
-------------------------------------------------------------
dsa_tree.0.auto                                     not bound

device name                                            status
-------------------------------------------------------------
(unknown)                                      not registered
d0032004.mdio-mii:10                                not bound
d0032004.mdio-mii:12                                not bound

the tree (component master) is unbound, its unbind() method calls
component_unbind_all(), and this also unbinds the other switches.

The idea that a "degraded state" doesn't make sense for the component
master seems pretty fundamental to the entire component API. But I
figured I'd ask, before I put the idea of using this API to rest.

[ although I do like the debugfs device_component folder ]

Otherwise, my plan is to go ahead and introduce new dsa_switch_ops API
to let them know of dynamic updates to the routing table. This way, we
remove the baked assumption that all of it is available at ds->ops->setup()
time and never changes afterwards. There is nothing, functionally speaking,
that the component API can help me with, for this desired behavior.

Just thought I'd let everybody know of my current intention of making
changes to DSA, and gather some opinions/confirmation that I am on the
right track/alternative suggestions.

Thanks,
Vladimir

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ