lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 4 Dec 2021 02:20:37 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Colin Foster <colin.foster@...advantage.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-gpio@...r.kernel.org" <linux-gpio@...r.kernel.org>,
        Lee Jones <lee.jones@...aro.org>,
        Linus Walleij <linus.walleij@...aro.org>,
        "katie.morris@...advantage.com" <katie.morris@...advantage.com>
Subject: Re: [RFC v1] mfd: pinctrl: RFC only: add and utilze mfd option in
 pinctrl-ocelot

On Fri, Dec 03, 2021 at 01:16:11PM -0800, Colin Foster wrote:
> This is a psuedo-commit, but one that tells the complete story of what I'm
> looking at. During an actual submission this'll be broken up into two
> commits, but I'd like to get some feedback on whether this is the correct
> path for me to be going down.
> 
> Background:
> 
> Microchip has a family of chips - the VSC7511, 7512, 7513, and 7514. The
> last two have an internal MIPS processor, which are supported by
> drivers/net/ethernet/mscc/ocelot_*. The former two lack this processor.
> 
> All four chips can be configured externally via a number of interfaces:
> SPI, I2C, PCIe... This is currently not supported and is my end goal.
> 
> The networking portion of these chips have been reused in other products as
> well. These utilize the common code by way of mscc_ocelot_switch_lib and
> net/dsa/ocelot/*. Specifically the "Felix" driver.
> 
> Current status:
> 
> I've put out a few RFCs on the "ocelot_spi" driver. It utilizes Felix and
> invokes much of the network portion of the hardware (VSC7512). It works
> great! Thanks community :)
> 
> There's more hardware that needs to get configured, however. Currently that
> includes general pin configuration, and an optional serial GPIO expander.
> The former is supported by drivers/pinctrl/pinctrl-ocelot.c and the latter
> by drivers/pinctrl/pinctrl-microchip-sgpio.c.
> 
> These drivers have been updated to use regmap instead of iomem, but that
> isn't the complete story. There are two options I know about, and maybe
> others I don't.
> 
> Option 1 - directly hook into the driver:
> 
> This was the path that was done in
> commit b99658452355 ("net: dsa: ocelot: felix: utilize shared mscc-miim
> driver for indirect MDIO access").
> This is in the net-next tree. In this case the Seville driver passes in its
> regmap to the mscc_miim_setup function, which bypasses mscc_miim_probe but
> allows the same driver to be used.
> 
> This was my initial implementation to hook into pinctrl-ocelot and
> pinctrl-microchip-sgpio. The good thing about this implementation is I have
> direct control over the order of things happening. For instance, pinctrl
> might need to be configured before the MDIO bus gets probed.
> 
> The bad thing about this implementation is... it doesn't work yet. My
> memory is fuzzy on this, but I recall noticing that the application of a
> devicetree pinctrl function happens in the driver probe. I ventured down
> this path of walking the devicetree, applying pincfg, etc. That was a path
> to darkness that I have abandoned.
> 
> What is functioning is I have debugfs / sysfs control, so I do have the
> ability to do some runtime testing and verification.
> 
> Option 2 - MFD (this "patch")
> 
> It really seems like anything in drivers/net/dsa/ should avoid
> drivers/pinctl, and that MFD is the answer. This adds some complexity to
> pinctrl-ocelot, and I'm not sure whether that breaks the concept of MFD. So
> it seems like I'm either doing something unique, or I'm doing something
> wrong.
> 
> I err on the assumption that I'm doing something wrong.
> 
> pinctrl-ocelot gets its resources the device tree by way of
> platform_get_and_ioremap_resource. This driver has been updated to support
> regmap in the pinctrl tree:
> commit 076d9e71bcf8 ("pinctrl: ocelot: convert pinctrl to regmap")
> 
> The problem comes about when this driver is probed by way of
> "mfd_add_devices". In an ideal world it seems like this would be handled by
> resources. MFD adds resources to the device before it gets probed. The
> probe happens and the driver is happy because platform_get_resource
> succeeds.
> 
> In this scenario the device gets probed, but needs to know how to get its
> regmap... not its resource. In the "I'm running from an internal chip"
> scenario, the existing process of "devm_regmap_init_mmio" will suffice. But
> in the "I'm running from an externally controlled setup via {SPI,I2C,PCIe}"
> the process needs to be "get me this regmap from my parent". It seems like
> dev_get_regmap is a perfect candidate for this.
> 
> Perhaps there is something I'm missing in the concept of resources /
> regmaps. But it seems like pinctrl-ocelot needs to know whether it is in an
> MFD scenario, and that concept didn't exist. Hence the addition of
> device_is_mfd as part of this patch. Since "device_is_mfd" didn't exist, it
> feels like I might be breaking the concept of MFD.

In the possibility that "device_is_mfd()" is not acceptable for the
pinctrl and mfd maintainers, one other way in which you could solve this
conundrum, without changing anything in the core, would be to introduce
a different compatible string for your driver variant, and the
pinctrl-ocelot driver could figure out based on that whether to request
a resource or a regmap. I don't see this being done, either, but I
suppose it could be made to work quite easily. Something like
"mscc,ocelot-mfd-pinctrl" instead of "mscc,ocelot-pinctrl".

> 
> I think this would lend itself to a pretty elegant architecture for the
> VSC751X externally controlled chips. In a manner similar to
> drivers/mfd/madera* there would be small drivers handling the prococol
> layers for SPI, I2C... A core driver would handle the register mappings,
> and could be gotten by dev_get_regmap. Every sub-device (DSA, pinctrl,
> other pinctrl, other things I haven't considered yet) could either rely on
> dev->parent directly, or in this case adjust. I can't imagine a scenario
> where someone would want pinctrl for the VSC7512 without the DSA side of
> things... but that would be possible in this architecture that would
> otherwise not.

Not pinctrl, but let me try to give you an example which is perhaps
relevant.

Earlier this year, Alvin Šipraga dropped a bomb stating that for some
DSA drivers for switches with embedded PHYs, those PHYs would fall back
to using the less specific (generic) PHY driver _if_ they are requesting
interrupts.
https://lore.kernel.org/netdev/cd0d9c40-d07b-e2ab-b068-d0bcb4685d09@bang-olufsen.dk/

It was a huge discussion, driver core maintainers got involved,
restructuring the DSA cross-chip probing design was on the table, the
PHY library too, yet to my knowledge nothing got resolved, and even if
it did, it was through the introduction of a hacky flag
(FWNODE_FLAG_BROKEN_PARENT). I don't remember if that workaround even
got applied or not.

"Broken parent"? Why broken?

See, ocelot is not the only DSA switch SoC that has more stuff going on
onboard than just the switching fabric. For most other DSA drivers, this
grew quite organically, from a simple MDIO device driver, to a device
driver for a switch that also registers many other drivers, and buses,
from its own probe function. And the OF bindings also grew organically..
and hierarchically with the switch on top, driving everything (this will
become important in a minute).

For example, the Marvell mv88e6xxx driver, or Alvin's rtl8365mb driver.
These drivers register an internal MDIO bus for their PHYs, and to avoid
polling those every second, they also register a cascaded irqchip driver.
A lot of care is taken such that the irqchip driver is registered before
the internal MDIO bus is, because PHYs on the internal MDIO bus will
need IRQs. Yet this appeared to be insufficient: on Alvin's system, the
probing of the PHY driver is attempted as soon as the PHYs were
discovered on the bus (expected), and immediately refused. Diagnosis?
-EPROBE_DEFER. What resource were these internal PHYs missing?
The interrupt parent. But how? The irqchip was registered just before,
it's there, waiting.

As it turns out, there is this mechanism called "fw_devlink" (part of
device links)
https://www.kernel.org/doc/html/latest/driver-api/device_link.html
that tries to be helpful and shave off a few hundreds of ms from the
boot time. To give you an example of the kind of thing it's designed to
be helpful with: say you have an MDIO-attached DSA switch, which is
sitting on an MDIO bus that happens to be registered by the DSA master
itself. Any sane Ethernet driver that has an MDIO bus on its hands will
first register the MDIO bus, then the net device (because registering
the net device will expose the possibility for it to connect to the PHY,
which may be located on its own MDIO bus, which needs to be available).
So it is natural to assume that this is also what happens inside our
Ethernet controller that is a DSA master here. So the implication is
that when the DSA master registers its MDIO bus, the devices on it will
attempt to probe, and therefore the switch too. The switch driver, and
DSA, will allocate some resources and finally search for its DSA master
via of_find_net_device_by_node().
This will inevitably fail because the DSA master has registered the MDIO
bus but not the net device (yet), and the switch will have to return
-EPROBE_DEFER and some CPU cycles have been uselessly wasted.
Probing will of course be resumed later, and will succeed after the DSA
master has registered the net device too.

Where fw_devlink comes into all of this is that it says "you know what,
I'm not even letting you play. I'm going to infer some relationships
between struct devices based on firmware node phandles (in this case,
think of the 'ethernet = <&dsa_master>' that DSA has), and if I detect
that the supplier of that phandle hasn't finished probing yet, I'll just
return -EPROBE_DEFER from the device core, not invoking any DSA switch
probe function at all. I'll call you when your suppliers are ready".

Reasonable, right?

Well, back to Alvin. To understand the problem he had (has?), you need
to better understand the flaws in fw_devlink. This mechanism essentially
says that if there's a supplier-consumer relationship between two OF
nodes by way of a phandle, there's also a supplier-consumer relationship
between the devices that probe on those OF nodes. So it creates device
links between those devices.

But the problem is that the DSA switch drivers may not have a separate
OF node for the interrupt controller, especially since the driver
writers didn't know any better, ages ago. There aren't dedicated struct
devices for the separate components like the irqchip, either, other than
the MDIO device itself, because there aren't any other buses on which
those other struct devices should exist.  So when the "interrupt-parent"
phandle gets translated by fw_devlink into a device link relationship,
that relationship is between the internal PHY (consumer) and none other
than the switch device itself (supplier).
So the PHY driver will be blocked with -EPROBE_DEFER from probing,
because its supplier has not finished probing (of course it didn't,
that's what it's doing currently). And the supplier (switch) attempts to
connect to the PHY from its probe path. Since binding the specific PHY
driver "failed" (-EPROBE_DEFER is still a failure), and the phy_connect()
call wants things to happen "now", the PHY library says "all right all
right, here's the generic PHY driver, just shut up". The generic driver
may or may not be sufficient to bring these internal PHYs up to a
satisfactory degree of initialization, hence some of the angry replies
on this topic. Not to mention, no specific PHY driver => no interrupts.

In the reasonable and unreasonable cases, the mechanism behind
fw_devlink is the same. Yet in one case it is helpful and in the other
it is not. The goal is to one day make fw_devlink enabled by default and
be aware of many supplier/consumer relationships.

I have brought up this not so distant story because I believe it to be
indirectly relevant to the design you choose for the vsc7512-spi driver.
If you keep moving forward with the traditional hierarchical model where
the DSA driver is the spi_device driver, and this registers whatever
component of the switch SoC is needed (MDIO bus, irqchip, GPIO, pinctrl
and whatnot), you will eventually run into Alvin's problem. And don't
rely on the DSA core, or fw_devlink, "doing something", because there
isn't something to be done there.
Instead, I believe that what would produce more future-proof results is
the mfd model, where the switch, mdiobus, irqchip, gpio, pinctrl drivers
etc are all flat and located below the bare spi_device driver for the
switch SoC. This model would inherently avoid the fw_devlink limitations,
which for good or bad aren't going away, since they're "features not bugs".
It also allows for the various components of the switch SoC to probe
independently, as far as I can tell. My belief is that many other DSA
drivers would benefit from a rewrite using the mfd model. The catch is
that OF bindings would need to change, which is kind of the point, but
still undesirable to some. We want a phandle like "interrupt-parent", if
any, to point laterally between switch SoC components, and not
vertically, therefore avoiding probing loops.

So don't feel bad for doing something different, consider yourself a
trailblazer :)

> 
> Signed-off-by: Colin Foster <colin.foster@...advantage.com>
> ---
>  drivers/mfd/mfd-core.c           | 6 ++++++
>  drivers/pinctrl/pinctrl-ocelot.c | 7 ++++++-
>  include/linux/mfd/core.h         | 2 ++
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mfd/mfd-core.c b/drivers/mfd/mfd-core.c
> index 684a011a6396..2ba6a692499b 100644
> --- a/drivers/mfd/mfd-core.c
> +++ b/drivers/mfd/mfd-core.c
> @@ -33,6 +33,12 @@ static struct device_type mfd_dev_type = {
>  	.name	= "mfd_device",
>  };
>  
> +int device_is_mfd(struct platform_device *pdev)
> +{
> +	return (!strcmp(pdev->dev.type->name, mfd_dev_type.name));
> +}
> +EXPORT_SYMBOL(device_is_mfd);
> +
>  int mfd_cell_enable(struct platform_device *pdev)
>  {
>  	const struct mfd_cell *cell = mfd_get_cell(pdev);
> diff --git a/drivers/pinctrl/pinctrl-ocelot.c b/drivers/pinctrl/pinctrl-ocelot.c
> index 0a36ec8775a3..758fbc225244 100644
> --- a/drivers/pinctrl/pinctrl-ocelot.c
> +++ b/drivers/pinctrl/pinctrl-ocelot.c
> @@ -10,6 +10,7 @@
>  #include <linux/gpio/driver.h>
>  #include <linux/interrupt.h>
>  #include <linux/io.h>
> +#include <linux/mfd/core.h>
>  #include <linux/of_device.h>
>  #include <linux/of_irq.h>
>  #include <linux/of_platform.h>
> @@ -1368,7 +1369,11 @@ static int ocelot_pinctrl_probe(struct platform_device *pdev)
>  
>  	regmap_config.max_register = OCELOT_GPIO_SD_MAP * info->stride + 15 * 4;
>  
> -	info->map = devm_regmap_init_mmio(dev, base, &regmap_config);
> +	if (device_is_mfd(pdev))
> +		info->map = dev_get_regmap(dev->parent, "GCB");
> +	else
> +		info->map = devm_regmap_init_mmio(dev, base, &regmap_config);
> +
>  	if (IS_ERR(info->map)) {
>  		dev_err(dev, "Failed to create regmap\n");
>  		return PTR_ERR(info->map);
> diff --git a/include/linux/mfd/core.h b/include/linux/mfd/core.h
> index 0bc7cba798a3..9980bcc8456d 100644
> --- a/include/linux/mfd/core.h
> +++ b/include/linux/mfd/core.h
> @@ -123,6 +123,8 @@ struct mfd_cell {
>  	int			num_parent_supplies;
>  };
>  
> +int device_is_mfd(struct platform_device *pdev);
> +
>  /*
>   * Convenience functions for clients using shared cells.  Refcounting
>   * happens automatically, with the cell's enable/disable callbacks
> -- 
> 2.25.1
>

Powered by blists - more mailing lists