[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b89a9e1-e92e-ca99-9fbd-1d98f6a7864b@bang-olufsen.dk>
Date: Wed, 18 Aug 2021 10:18:06 +0000
From: Alvin Šipraga <ALSI@...g-olufsen.dk>
To: Saravana Kannan <saravanak@...gle.com>,
Vladimir Oltean <olteanv@...il.com>
CC: Vladimir Oltean <vladimir.oltean@....com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Jakub Kicinski <kuba@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Florian Fainelli <f.fainelli@...il.com>,
Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>,
Frank Rowand <frowand.list@...il.com>,
Rob Herring <robh+dt@...nel.org>
Subject: Re: [PATCH net] net: dsa: sja1105: fix use-after-free after calling
of_find_compatible_node, or worse
Hi Saravana,
On 8/18/21 4:46 AM, Saravana Kannan wrote:
> On Tue, Aug 17, 2021 at 3:31 PM Vladimir Oltean <olteanv@...il.com> wrote:
>>
>> Hi Alvin,
>>
>> On Tue, Aug 17, 2021 at 09:25:28PM +0000, Alvin Šipraga wrote:
>>> I have an observation that's slightly out of the scope of your patch,
>>> but I'll post here on the off chance that you find it relevant.
>>> Apologies if it's out of place.
>>>
>>> Do these integrated NXP PHYs use a specific PHY driver, or do they just
>>> use the Generic PHY driver?
>>
>> They refuse to probe at all with the Generic PHY driver. I have been
>> caught off guard a few times now when I had a kernel built with
>> CONFIG_NXP_C45_TJA11XX_PHY=n and their probing returns -22 in that case.
>>
>>> If the former is the case, do you experience that the PHY driver fails
>>> to get probed during mdiobus registration if the kernel uses
>>> fw_devlink=on?
>>
>> I don't test with "fw_devlink=on" in /proc/cmdline, this is the first
>> time I do it. It behaves exactly as you say.
>>
>>>
>>> In my case I am writing a new subdriver for realtek-smi, a DSA driver
>>> which registers an internal MDIO bus analogously to sja1105, which is
>>> why I'm asking. I noticed a deferred probe of the PHY driver because the
>>> supplier (ethernet-switch) is not ready - presumably because all of this
>>> is happening in the probe of the switch driver. See below:
>>>
>>> [ 83.653213] device_add:3270: device: 'SMI-0': device_add
>>> [ 83.653905] device_pm_add:136: PM: Adding info for No Bus:SMI-0
>>> [ 83.654055] device_add:3270: device: 'platform:ethernet-switch--mdio_bus:SMI-0': device_add
>>> [ 83.654224] device_link_add:843: mdio_bus SMI-0: Linked as a sync state only consumer to ethernet-switch
>>> [ 83.654291] libphy: SMI slave MII: probed
>>> ...
>>> [ 83.659809] device_add:3270: device: 'SMI-0:00': device_add
>>> [ 83.659883] bus_add_device:447: bus: 'mdio_bus': add device SMI-0:00
>>> [ 83.659970] device_pm_add:136: PM: Adding info for mdio_bus:SMI-0:00
>>> [ 83.660122] device_add:3270: device: 'platform:ethernet-switch--mdio_bus:SMI-0:00': device_add
>>> [ 83.660274] devices_kset_move_last:2701: devices_kset: Moving SMI-0:00 to end of list
>>> [ 83.660282] device_pm_move_last:203: PM: Moving mdio_bus:SMI-0:00 to end of list
>>> [ 83.660293] device_link_add:859: mdio_bus SMI-0:00: Linked as a consumer to ethernet-switch
>>> [ 83.660350] __driver_probe_device:736: bus: 'mdio_bus': __driver_probe_device: matched device SMI-0:00 with driver RTL8365MB-VC Gigabit Ethernet
>>> [ 83.660365] device_links_check_suppliers:1001: mdio_bus SMI-0:00: probe deferral - supplier ethernet-switch not ready
>>> [ 83.660376] driver_deferred_probe_add:138: mdio_bus SMI-0:00: Added to deferred list
>>
>> So it's a circular dependency? Switch cannot finish probing because it
>> cannot connect to PHY, which cannot probe because switch has not
>> finished probing, which....
>
> Hi Vladimir/Alvin,
>
> If there's a cyclic dependency between two devices, then fw_devlink=on
> is smart enough to notice that. Once it notices a cycle, it knows that
> it can't tell which one is the real dependency and which one is the
> false dependency and so stops enforcing ordering between the devices
> in the cycle.
>
> But fw_devlink doesn't understand all the properties yet. Just most of
> them and I'm always trying to add more. So when it only understands
> the property that's causing the false dependency but not the property
> that causes the real dependency, it can cause issues like this where
> fw_devlink=on enforces the false dependency and the driver/code
> enforces the real dependency. These are generally easy to fix -- you
> just need to teach fw_devlink how to parse more properties.
>
> This is just a preliminary analysis since I don't have all the info
> yet -- so I could be wrong. With that said, I happened to be working
> on adding fw_devlink support for phy-handle property and I think it
> should fix your issue with fw_devlink=on. Can you give [1] a shot?
I tried [1] but it did not seem to have any effect.
>
> If it doesn't fix it, can one of you please point me to an upstream
> dts (not dtsi) file for a platform in which you see this issue? And
> ideally also the DT nodes and their drivers that are involved in this
> cycle? With that info, I should be able to root cause this if the
> patch above doesn't already fix it.
I'm working with a non-upstream dts - maybe Vladimir is using an
upstream one? The pattern among the drivers/dts is common between our
two cases.
But for the sake of this discussion, my dts is pretty much the same as
what you will find in arch/arm/boot/dts/gemini-dlink-dir-685.dts. The
nodes of interest from that dts file are below, and the driver is in
drivers/net/ds/{realtek-smi-core.c,rtl8366rb.c}. It's expected that the
Realtek PHY driver in drivers/net/phy/realtek.c will get probed as part
of the mdiobus registration, but that never happens. See my previous
reply for a debug log.
/ {
switch {
compatible = "realtek,rtl8366rb";
mdc-gpios = <&gpio0 21 GPIO_ACTIVE_HIGH>;
mdio-gpios = <&gpio0 22 GPIO_ACTIVE_HIGH>;
reset-gpios = <&gpio0 14 GPIO_ACTIVE_LOW>;
realtek,disable-leds;
switch_intc: interrupt-controller {
/* GPIO 15 provides the interrupt */
interrupt-parent = <&gpio0>;
interrupts = <15 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
#address-cells = <0>;
#interrupt-cells = <1>;
};
ports {
/* snip */
};
mdio {
compatible = "realtek,smi-mdio";
#address-cells = <1>;
#size-cells = <0>;
phy0: phy@0 {
reg = <0>;
interrupt-parent = <&switch_intc>;
interrupts = <0>;
};
phy1: phy@1 {
reg = <1>;
interrupt-parent = <&switch_intc>;
interrupts = <1>;
};
phy2: phy@2 {
reg = <2>;
interrupt-parent = <&switch_intc>;
interrupts = <2>;
};
phy3: phy@3 {
reg = <3>;
interrupt-parent = <&switch_intc>;
interrupts = <3>;
};
phy4: phy@4 {
reg = <4>;
interrupt-parent = <&switch_intc>;
interrupts = <12>;
};
};
};
};
Thanks for looking into this. Let me know if you need any other info or
want something tested.
Kind regards,
Alvin
>
>>
>> So how is it supposed to be solved then? Intuitively the 'mdio_bus SMI-0:00'
>> device should not be added to the deferred list, it should have everything
>> it needs right now (after all, it works without fw_devlink). No?
>>
>> It might be the late hour over here too, but right now I just don't
>> know. Let me add Saravana to the discussion too, he made an impressive
>> analysis recently on a PHY probing issue with mdio-mux,
>
> Lol, thanks for the kind words.
>
>> so the PHY
>> library probing dependencies should still be fresh in his mind, maybe he
>> has an idea what's wrong.
>
> [1] - https://lore.kernel.org/lkml/20210818021717.3268255-1-saravanak@google.com/T/#u>>
> Thanks,
> Saravana
>
Powered by blists - more mailing lists