netdev - Re: [PATCH net 4/4] docs: net: dsa: replace TODO section with info about history and devel ideas

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c6f67ee3-cc88-41f2-8080-153d11e7d3cf@gmail.com>
Date: Mon, 11 Dec 2023 09:29:17 -0800
From: Florian Fainelli <f.fainelli@...il.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew@...n.ch>,
 Luiz Angelo Daros de Luca <luizluca@...il.com>,
 Alvin ┼áipraga <alsi@...g-olufsen.dk>,
 Madhuri Sripada <madhuri.sripada@...rochip.com>,
 Marcin Wojtas <mw@...ihalf.com>, Linus Walleij <linus.walleij@...aro.org>,
 Tobias Waldekranz <tobias@...dekranz.com>,
 Arun Ramadoss <arun.ramadoss@...rochip.com>,
 "Russell King (Oracle)" <rmk+kernel@...linux.org.uk>,
 Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH net 4/4] docs: net: dsa: replace TODO section with info
 about history and devel ideas

On 12/8/23 17:58, Vladimir Oltean wrote:
> On Fri, Dec 08, 2023 at 03:03:35PM -0800, Florian Fainelli wrote:
>>> +Probing through ``platform_data`` remains limited in functionality. The
>>> +``ds->dev->of_node`` and ``dp->dn`` pointers are NULL, and the OF API calls
>>> +made by drivers for discovering more complex setups fall back to the implicit
>>> +handling. There is no way to describe multi-chip trees, or switches with
>>> +multiple CPU ports. It is always assumed that shared ports are configured by
>>> +the driver to the maximum supported link speed (they do not use phylink).
>>> +User ports cannot connect to arbitrary PHYs, but are limited to
>>> +``ds->user_mii_bus``.
>>
>> Maybe a mention here that this implies built-in/internal PHY devices only,
>> just as a way to re-iterate the limitation and to echo to the previous
>> patch?
> 
> I am not fully convinced that saying "user_mii_bus can only access internal PHYs"
> only would be correct. This paragraph also exists in the user MDIO bus section:
> 
> For Ethernet switches which have both external and internal MDIO buses, the
> user MII bus can be utilized to mux/demux MDIO reads and writes towards either
> internal or external MDIO devices this switch might be connected to: internal
> PHYs, external PHYs, or even external switches.

OK, so what you have put is good enough, no need to add more that would 
only be a repeat of the user_mii_bus description (with the risk of the 
two being out of sync eventually).

> 
>>> +
>>> +Many switch drivers introduced since after DSA's second OF binding were not
>>> +designed to support probing through ``platform_data``. Most notably,
>>> +``device_get_match_data()`` and ``of_device_get_match_data()`` return NULL with
>>> +``platform_data``, so generally, drivers which do not have alternative
>>> +mechanisms for this do not support ``platform_data``.
>>> +
>>> +Extending the ``platform_data`` support implies adding more separate code.
>>> +An alternative worth exploring is growing DSA towards the ``fwnode`` API.
>>> +However, not the entire OF binding should be generalized to ``fwnode``.
>>> +The current bindings must be examined with a critical eye, and the properties
>>> +which are no longer considered good practice (like ``label``, because ``udev``
>>> +offers this functionality) should first be deprecated in OF, and not migrated
>>> +to ``fwnode``.
>>> +
>>> +With ``fwnode`` support in the DSA framework, the ``fwnode_create_software_node()``
>>> +API could be used as an alternative to ``platform_data``, to allow describing
>>> +and probing switches on non-OF.
>>
>> Might suggest to move the 3 paragraphs towards the end because otherwise it
>> might be a distraction for the reader who might think: ah that's it? no more
>> technical details!? Looks like Linus made the same suggestion in his review.
> 
> I think it needs even more rethinking than that. I now remembered that
> we also have a "Design limitations" section where the future work can go.
> 
> It's hard to navigate through what is now a 1400 line document and not
> get lost.

Agreed.

> 
> I'm hoping I could move the documentation of variables and methods that
> now sits in "Driver development" into kdoc comments inline with the code,
> to reduce the clutter a bit.
> 
> But I don't know how to tackle this. Should documentation changes go to
> "net" or to "net-next"? I targeted this for "net" as a documentation-only
> change set. But if I start adding kdocs, it won't be so clear-cut anymore...

Personal preference is that documentation changes should always target 
'net' (unless they document a 'net-next' change obviously) because 
accurate documentation is most important than anything.

> 
>>> +Simplifying the device tree bindings to require a single ``link`` phandle
>>> +cannot be done without rethinking the distributed probing scheme. One idea is
>>> +to reinstate the switch tree as a platform device, but this time created
>>> +dynamically by ``dsa_register_switch()`` if the switch's tree ID is not already
>>> +present in the system. The switch tree driver walks the device tree hop by hop,
>>> +following the ``link`` references, to discover all the other switches, and to
>>> +construct the full routing table. It then uses the component API to register
>>> +itself as an aggregate driver, with each of the discovered switches as a
>>> +component. When ``dsa_register_switch()`` completes for all component switches,
>>> +the tree probing continues and calls ``dsa_tree_setup()``.
>>
>> An interesting paragraph, but I am not sure this is such a big pain point
>> that we should be spending much description of it, especially since it
>> sounds like this is solved, but could be improved, but in the grand scheme
>> of things, should we really be spending any time on it?
>>
>> Same-vendor cascade configurations are Marvell specific, and different
>> vendor cascades require distinct switch trees, therefore do not really fall
>> into the cross-chip design anymore. In a nutshell, cross-chip is very very
>> niche and limited.
> 
> Well, I've been contacted by somebody to help with a custom board with 3
> daisy chained SJA1105 switches. He is doing the testing for me, and I'm
> waiting for the results to come back. I'm currently waiting for an uprev
> to an NXP BSP on top of 5.15 to be finalized, so that patches developed
> over net-next are at least barely testable...
> 
> If you remember, the SJA1105 has these one-shot management routes which
> must be installed over SPI, and they decide where the next transmitted
> link-local packet goes.

Yes, it's coming back now.

> 
> Well, the driver only supports single-chip trees, as you say, because it
> only programs the management route in the targeted switch. With daisy
> chained switches, one needs to figure out the actual packet route from
> the CPU to the leaf user port, and install a management route for every
> switch along that route. Otherwise, intermediary switches won't know
> what to do with the packets, and drop them.
> 
> The specific request was: "help, PTP doesn't work!"
> 
> I did solve the problem, and the documentation paragraphs above are
> basically my development notes while examining the existing support
> and the way in which it isn't giving me the tools I need.
> 
> I do need to send a dt-bindings patch on this topic as well. The fact
> that we put all cascade links in the device tree means we don't know
> which one represents the direct connection to the neighbor cascade port,
> and which one is an indirect connection. We need to bake in an
> assumption, like for example "the first element of the 'link' phandle
> array is the direct connection". I hope retroactively doing that won't
> bother the device tree maintainers too much. If it does, the problem is
> intractable.

Yes I believe this is exactly how we had intended for the "link" 
property to be used. We should have indeed encoded that more precisely 
into the property.

> 
> But I agree, requirements for cross-chip support are rare, and with
> SJA1105 I don't even own a board where I can directly test it. I
> specifically bought the Turris MOX for that.
> 
>>> +To untangle this situation and improve the reliability of the cross-chip
>>> +management layer, it is necessary to split the DSA operations into ones which
>>> +can fail, and ones which cannot fail. For example, ``port_fdb_add()`` can fail,
>>> +whereas ``port_fdb_del()`` cannot. Then, cross-chip operations can take a
>>> +fallible function to make forward progress, and an infallible function for
>>> +rollback. However, it is unclear what to do in the case of ``change_mtu()``.
>>> +It is hard to classify this operation as either fallible or infallible. It is
>>> +also unclear how to deal with I/O access errors on the switch's management bus.
>>
>> How about something like this:
>>
>> I/O access errors occurring during the switch configuration should always be
>> logged for debugging but are very unlikely to be recoverable and therefore
>> require an investigation into the failure mechanism and root cause or
>> possible work around.
> 
> Yeah, I suppose.
> 
> What do you think, has something like phy_error() been a useful mechanism
> for anything? Or just a pain in the rear? Would it be useful to shut
> everything down on a bus I/O error, phylib style?

phy_error() has been somewhat helpful in knowing whether we have a hard 
to debug, possibly silent MDIO error lurking around. Because the PHY 
library polls or reacts to interrupts updating the link status has a 
very high probability of hitting a MDIO transaction error. This is 
similar to what we see with some of our fielded reports where most 
text/data corruptions occur in scheduler code paths... because the 
scheduler is very often executed. So you know you have an error, but you 
don't know yet why.

For a switch there can be a lot of different transactions, but being 
able to know where it fails precisely could be a lead as to where the 
failure is.
-- 
Floria