lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 26 Aug 2021 21:02:16 -0700
From:   Saravana Kannan <saravanak@...gle.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Linus Walleij <linus.walleij@...aro.org>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vladimir Oltean <olteanv@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>, Len Brown <lenb@...nel.org>,
        Alvin Sipraga <ALSI@...g-olufsen.dk>, kernel-team@...roid.com,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        linux-acpi@...r.kernel.org
Subject: Re: [PATCH v1 1/2] driver core: fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT

On Thu, Aug 26, 2021 at 6:23 PM Andrew Lunn <andrew@...n.ch> wrote:
>
> > Doesn't add much to the discussion. In the example I gave, the driver
> > already does synchronous probing. If the device can't probe
> > successfully because a supplier isn't ready, it doesn't matter if it's
> > a synchronous probe. The probe would still be deferred and we'll hit
> > the same issue. Even in the situation the commit [5] describes, if
> > parallelized probing is done and the PHY depended on something (say a
> > clock), you'd still end up not probing the PHY even if the driver is
> > present and the generic PHY would end up force probing it.
>
>
> genphy is meant to be used when there is no other driver available.
> It is a best effort, better than nothing, might work. And quite a few
> boards rely on it. However, it should not be used when there is a
> specific driver.

Agreed, that's what we are trying to ensure.

> So if the PHY device has been probed, and -EPROBE_DEFER was returned,
> we also need to return -EPROBE_DEFER here when deciding if genphy
> should be used. It should then all unwind and try again later.

Yes, I think dsa_register_switch() returning -EPROBE_DEFER if the PHYs
returned -EPROBE_DEFER might be okay (I think we should do it), but
that doesn't solve the problem for this driver.

fw_devlink=on/device links short circuits the probe() call of a
consumer (in this case the PHY) and returns -EPROBE_DEFER if the
supplier's (in this case switch) probe hasn't finished without an
error. fw_devlink/device links effectively does the probe in graph
topological order and there's a ton of good reasons to do it that way
-- what's why fw_devlink=on was implemented.

In this specific case though, since the PHY depends on the parent
device, if we fail the parent's probe realtek_smi_probe() because the
PHYs failed to probe, we'll get into a catch-22/chicken-n-egg
situation and the switch/PHYs will never probe.

I think a clean way to fix this at the driver level is to do what I
said in [6]. Copy pasting it here and expanding it a bit:

1. The IRQ registration and mdio bus registration should get moved to
realtek_smi_probe() which probes "realtek,rtl8366rb". So
realtek_smi_probe() succeeding doesn't depend on its child devices
probing successfully (which makes sense for any parent device).
2. realtek_smi_probe() should also create/register a
component-master/aggregate device that's "made up of"
realtek,rtl8366rb and all the PHYs. So the component-master will wait
for all of them to finish probing before it's initialized.
3. PHYs will probe successfully now because realtek,rtl8366rb probe()
which is the supplier's probe has finished probing without problems.
4. The component device's init (the .bind op) would call
dsa_register_switch() which kinda makes sense because the rtl8366rb
and all the PHYs combined together is what makes up the logical DSA
switch. The dsa_register_switch() will succeed and will be using the
right/specific PHY driver.

The same applies for any switch that has the PHYs as it's child device
AND (this is the key part) the PHYs depend on the switch as a supplier
(remember, if we didn't have the interrupt dependency, this would not
be an issue).

> I don't know the device core, but it looks like dev->can_match tells
> us what we need to know. If true, we know there is a driver for this
> device. But i'm hesitant to make use of this outside of driver/base.

can_match is never cleared once it's set and it's meant as an
optimization/preserving some probe order stuff. I wouldn't depend on
it for this case. We can just have a phy_has_driver()  function that
searches all the currently registered PHY drivers to check if there's
a matching driver. And dsa_register_switch() or phy_attach_direct()
can return -EPROBE_DEFER if there is a driver but it isn't bound yet.
Again, this is orthogonal to the realtek driver fix though because of
the catch-22 situation above.

-Saravana

[6] - https://lore.kernel.org/netdev/CAGETcx8_vxxPxF8WrXqk=PZYfEggsozP+z9KyOu5C2bEW0VW8g@mail.gmail.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ