netdev - Re: [PATCH net 4/4] docs: net: dsa: replace TODO section with info about history and devel ideas

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2b4f6a45-2af2-482b-b8f5-f2bece824912@gmail.com>
Date: Fri, 8 Dec 2023 15:03:35 -0800
From: Florian Fainelli <f.fainelli@...il.com>
To: Vladimir Oltean <vladimir.oltean@....com>, netdev@...r.kernel.org
Cc: "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew@...n.ch>,
 Luiz Angelo Daros de Luca <luizluca@...il.com>,
 Alvin Šipraga <alsi@...g-olufsen.dk>,
 Madhuri Sripada <madhuri.sripada@...rochip.com>,
 Marcin Wojtas <mw@...ihalf.com>, Linus Walleij <linus.walleij@...aro.org>,
 Tobias Waldekranz <tobias@...dekranz.com>,
 Arun Ramadoss <arun.ramadoss@...rochip.com>,
 "Russell King (Oracle)" <rmk+kernel@...linux.org.uk>,
 Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH net 4/4] docs: net: dsa: replace TODO section with info
 about history and devel ideas

On 12/8/23 11:35, Vladimir Oltean wrote:
> It was a bit unclear to me what the TODO is about and what is even
> actionable about it. I had a discussion with Florian about it at NetConf
> 2023, and it seems that it's about the amount of boilerplate code that
> exists in switchdev drivers, and how that could be maybe made common
> with DSA, through something like another library.
> 
> I think we are seeing a lot of people who are looking at DSA now,
> and there is a lot of misunderstanding about why things are the way
> they are, and which are the changes that would benefit the subsystem,
> compared to the changes that go against DSA's past development trend.
> 
> I think what is missing is the context, which is admittedly a bit
> hard to grasp considering there are 15 years of development.
> Based on the git log and on the discussions where I participated,
> I tried to cobble up a section about DSA's history. Here and there,
> I've mentioned the limitations that I am aware of, and some possible
> ways forward.
> 
> I'm also personally surprised by the amount of change in DSA over the
> years, and I hope that putting things into perspective will also
> encourage others to not think that it's set in stone the way it is now.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>

This is really great, and thanks for having put that together, it 
represents an useful timeline of changes introduces.

> ---

[snip]

> +Probing through ``platform_data`` remains limited in functionality. The
> +``ds->dev->of_node`` and ``dp->dn`` pointers are NULL, and the OF API calls
> +made by drivers for discovering more complex setups fall back to the implicit
> +handling. There is no way to describe multi-chip trees, or switches with
> +multiple CPU ports. It is always assumed that shared ports are configured by
> +the driver to the maximum supported link speed (they do not use phylink).
> +User ports cannot connect to arbitrary PHYs, but are limited to
> +``ds->user_mii_bus``.

Maybe a mention here that this implies built-in/internal PHY devices 
only, just as a way to re-iterate the limitation and to echo to the 
previous patch?

> +
> +Many switch drivers introduced since after DSA's second OF binding were not
> +designed to support probing through ``platform_data``. Most notably,
> +``device_get_match_data()`` and ``of_device_get_match_data()`` return NULL with
> +``platform_data``, so generally, drivers which do not have alternative
> +mechanisms for this do not support ``platform_data``.
> +
> +Extending the ``platform_data`` support implies adding more separate code.
> +An alternative worth exploring is growing DSA towards the ``fwnode`` API.
> +However, not the entire OF binding should be generalized to ``fwnode``.
> +The current bindings must be examined with a critical eye, and the properties
> +which are no longer considered good practice (like ``label``, because ``udev``
> +offers this functionality) should first be deprecated in OF, and not migrated
> +to ``fwnode``.
> +
> +With ``fwnode`` support in the DSA framework, the ``fwnode_create_software_node()``
> +API could be used as an alternative to ``platform_data``, to allow describing
> +and probing switches on non-OF.

Might suggest to move the 3 paragraphs towards the end because otherwise 
it might be a distraction for the reader who might think: ah that's it? 
no more technical details!? Looks like Linus made the same suggestion in 
his review.

> +
> +DSA is used to control very complex switching chips. Some devices have a
> +microprocessor, and in some cases, this microprocessor can run a variant of the
> +Linux kernel. Sometimes, the switch packet I/O procedure of the internal
> +microprocessor is different from the packet I/O procedure for an external host.
> +The internal processor may have access to switch queues, while the external
> +processor may require DSA tags. Other times, the microprocessor may also be
> +connected to the switch in a DSA fashion (using an internal MAC to MAC
> +connection).
> +
> +Since DSA is only concerned with switches where the packet I/O is handled
> +by an intermediate conduit driver, this leads to the situation where it is
> +recommended to have two drivers for the same switch hardware. 

the same switch hardware, one for each of the use cases described above.

When the queues
> +are accessed directly, a separate non-DSA driver should be used, with its own
> +skeleton which is integrated with ``switchdev`` on its own.
> +
> +In 2019, a DSA driver was added for the ``ocelot`` switch, which is a thin
> +front-end over a hardware library that is also common with a ``switchdev``
> +driver. While this design is encouraged for other similar cases, code
> +duplication among multiple front-ends is a concern, so it may be desirable to
> +extract some of DSA's core functionality into a reusable library for Ethernet
> +switches. This could offer a driver-facing API similar to ``dsa_switch_ops``,
> +but the aspects relating to cross-chip management, to DSA tags and to the
> +conduit interface would remain DSA-specific.

Yes! That was exactly the idea indeed.

> +
> +Traditionally, DSA switch drivers for discrete chips own the entire
> +``spi_device``, ``i2c_client``, ``mdio_device`` etc. When the chip is complex
> +and has multiple embedded peripherals (IRQ controller, GPIO controller, MDIO
> +controller, LED controller, sensors), the handling of these peripherals is
> +currently monolithic within the same device driver that also calls
> +``dsa_register_switch()``.
> +
> +But an internal microprocessor may have a very different view of the switch
> +address space, and may have discrete Linux drivers for each peripheral.
> +In 2023, the ``ocelot_ext`` driver was added, which deviated from the
> +traditional DSA driver architecture. Rather than probing on the entire
> +``spi_device``, it created a multi-function device (MFD) top-level driver for
> +it (associated with the SoC at large), and the switching IP is only one of the
> +children of the MFD (it is a platform device with regmaps provided by its
> +parent). The drivers for each peripheral in this switch SoC are the same when
> +controlled over SPI and when controlled by the internal processor.
> +
> +Authors of new switch drivers that use DSA are encouraged to have a wider view
> +of the peripherals on the chip that they are controlling, and to use the MFD
> +model to separate the components whenever possible. The general direction for
> +the DSA core is to shrink in size and to focus only on the Ethernet switching
> +IP and its ports. ``CONFIG_NET_DSA_HWMON`` was removed in 2017. Adding new
> +methods to ``struct dsa_switch_ops`` which are outside of DSA's core focus on
> +Ethernet is strongly discouraged.

Agreed and good idea to put that on (virtual) paper.

> +
> +DSA's support for multi-chip trees also has limitations. After converting from
> +the first to the second OF binding, the switch tree stopped being a platform
> +device, and its probing became implicit, and distributed among its constituent
> +switch devices. There is currently a synchronization point in
> +``dsa_tree_setup_routing_table()``, through which the tree setup is performed
> +only once, when there is more than one switch in the tree. The first N-1
> +switches will end their probing early, and the last switch will configure the
> +entire tree, and thus all the other switches, in its ``dsa_register_switch()``
> +calling context.
> +
> +Furthermore, the synchronization point works because each switch is able to
> +determine, in a distributed manner, that the routing table is not complete, aka
> +that there is at least one switch which has not probed. This is only possible
> +because the ``link`` properties in the device tree describe the connections to
> +all other cascade ports in the tree, not just to the directly connected cascade
> +port. If only the latter were described, it could happen that a switch waits
> +for its direct neighbors to probe before setting up the tree, but not
> +necessarily for all switches in the tree (therefore, it sets up the tree too
> +early).
> +
> +With more than 3 switches in a tree, it becomes a difficult task to write
> +correct device trees which are not missing any link to the other cascade ports
> +in the tree. The routing table, based on which ``dsa_routing_port()`` works, is
> +directly taken from the device tree, although it could be computed through BFS
> +instead. This means that the device tree writer needs to specify more than just
> +the hardware description (represented by the direct cascade port connections).
> +
> +Simplifying the device tree bindings to require a single ``link`` phandle
> +cannot be done without rethinking the distributed probing scheme. One idea is
> +to reinstate the switch tree as a platform device, but this time created
> +dynamically by ``dsa_register_switch()`` if the switch's tree ID is not already
> +present in the system. The switch tree driver walks the device tree hop by hop,
> +following the ``link`` references, to discover all the other switches, and to
> +construct the full routing table. It then uses the component API to register
> +itself as an aggregate driver, with each of the discovered switches as a
> +component. When ``dsa_register_switch()`` completes for all component switches,
> +the tree probing continues and calls ``dsa_tree_setup()``.

An interesting paragraph, but I am not sure this is such a big pain 
point that we should be spending much description of it, especially 
since it sounds like this is solved, but could be improved, but in the 
grand scheme of things, should we really be spending any time on it?

Same-vendor cascade configurations are Marvell specific, and different 
vendor cascades require distinct switch trees, therefore do not really 
fall into the cross-chip design anymore. In a nutshell, cross-chip is 
very very niche and limited.

> +
> +The cross-chip management layer (``net/dsa/switch.c``) can also be improved.
> +Currently ``struct dsa_switch_tree`` holds a list of ports rather than a list
> +of switches, and thus, calling one function for each switch in a tree is hard.
> +DSA currently uses one notifier chain per tree as a workaround for that, with
> +each switch registered as a listener (``dsa_switch_event()``).
> +
> +It is considered bad practice to use notifiers when the emitter and the
> +listener are known to each other, instead of a plain function call. Also, error
> +handling with notifiers is not robust. When one switch fails mid-operation,
> +there is no rollback to the previous state for switches which already completed
> +the operation successfully.
> +
> +To untangle this situation and improve the reliability of the cross-chip
> +management layer, it is necessary to split the DSA operations into ones which
> +can fail, and ones which cannot fail. For example, ``port_fdb_add()`` can fail,
> +whereas ``port_fdb_del()`` cannot. Then, cross-chip operations can take a
> +fallible function to make forward progress, and an infallible function for
> +rollback. However, it is unclear what to do in the case of ``change_mtu()``.
> +It is hard to classify this operation as either fallible or infallible. It is
> +also unclear how to deal with I/O access errors on the switch's management bus.

How about something like this:

I/O access errors occurring during the switch configuration should 
always be logged for debugging but are very unlikely to be recoverable 
and therefore require an investigation into the failure mechanism and 
root cause or possible work around.
-- 
Florian