[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx-7xgt5y_zNHzSMQf4YFCmWRPfP4_voshbNxKPgQ=b1tA@mail.gmail.com>
Date: Mon, 23 Aug 2021 11:50:50 -0700
From: Saravana Kannan <saravanak@...gle.com>
To: Alvin Šipraga <ALSI@...g-olufsen.dk>
Cc: Vladimir Oltean <olteanv@...il.com>,
Vladimir Oltean <vladimir.oltean@....com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Jakub Kicinski <kuba@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Florian Fainelli <f.fainelli@...il.com>,
Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>,
Frank Rowand <frowand.list@...il.com>,
Rob Herring <robh+dt@...nel.org>
Subject: Re: [PATCH net] net: dsa: sja1105: fix use-after-free after calling
of_find_compatible_node, or worse
On Sun, Aug 22, 2021 at 7:19 AM Alvin Šipraga <ALSI@...g-olufsen.dk> wrote:
>
> Hi Saravana,
>
> Thanks for the follow-up. I tested your change and it does the trick:
> there is no deferral and the PHY driver gets probed first-try during the
> mdiobus registration during the call to dsa_register_switch().
I'm fairly certain the mdiobus registration happens before
dsa_register_switch(). It's in the probe call path of the DSA. The
connecting of the PHYs with the DSA is what happens when
dsa_register_switch() is called.
> I tested
> with the switch, PHY, and tagging drivers all builtin, or all modules,
> and it worked in both cases.
>
> On 8/20/21 6:52 PM, Saravana Kannan wrote:
> > Hi Alvin,
> >
> > Can you give this a shot to see if it fixes your issue? It basically
> > delays the registration of dsa_register_switch() until all the
> > consumers of this switch have probed. So it has a couple of caveats:
>
> Hm, weren't the only consumers the PHYs themselves? It seems like the
> main effect of your change is that - by doing the actual
> dsa_register_switch() call after the switch driver probe - the
> ethernet-switch (provider) is already probed, thereby allowing the PHY
> (consumer) to probe immediately.
Correct-ish -- if you modify this to account for what I said above.
>
> > 1. I'm hoping the PHYs are the only consumers of this switch.
>
> In my case that is true, if you count the mdio_bus as well:
>
> /sys/devices/platform/ethernet-switch# ls -l consumer\:*
> lrwxrwxrwx 1 root root 0 Aug 22 16:00
> consumer:mdio_bus:SMI-0 ->
> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0
> lrwxrwxrwx 1 root root 0 Aug 22 16:00
> consumer:mdio_bus:SMI-0:00 ->
> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:00
> lrwxrwxrwx 1 root root 0 Aug 22 16:00
> consumer:mdio_bus:SMI-0:01 ->
> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:01
> lrwxrwxrwx 1 root root 0 Aug 22 16:00
> consumer:mdio_bus:SMI-0:02 ->
> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:02
> lrwxrwxrwx 1 root root 0 Aug 22 16:00
> consumer:mdio_bus:SMI-0:03 ->
> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:03
Hmm... mdio_bus being a consumer should prevent the sync_state() from
being called on "ethernet-switch". What's the value of the "status"
and "sync_state_only" files inside that mdio_bus folder?
> > 2. All of them have to probe successfully before the switch will
> > register itself.
>
> Yes.
Right, it's a yes in your case. But will it be a yes for all instances
of "realtek,rtl8366rb"?
> > 3. If dsa_register_switch() fails, we can't defer the probe (because
> > it already succeeded). But I'm not sure if it's a likely error code.
>
> It's of course possible that dsa_register_switch() fails. Assuming
> fw_devlink is doing its job properly, I think the reason is most likely
> going to be something specific to the driver, such as a communication
> timeout with the switch hardware itself.
But what if someone sets fw_devlink=permissive? Is it okay to break
this? There are ways to make this work for fw_devlink=permissive and
=on -- you check for each and decide where to call
dsa_register_switch() based on that.
> I get the impression that you don't necessarily regard this change as a
> proper fix, so I'm happy to do further tests if you choose to
> investigate further.
I thought about this in the background the past few days. I think
there are a couple of options:
1. We (community/Andrew) agree that this driver would only work with
fw_devlink=on and we can confirm that the other upstream uses of
"realtek,rtl8366rb" won't have any unprobed consumers problem and
switch to using my patch. Benefit is that it's a trivial and quick
change that gets things working again.
2. The "realtek,rtl8366rb" driver needs to be fixed to use a
"component device". A component device is a logical device that
represents a group of other devices. It's only initialized after all
these devices have probed successfully. The actual switch should be a
component device and it should call dsa_register_switch() in it's
"bind" (equivalent of probe). That way you can explicitly control what
devices need to be probed instead of depending on sync_state() that
have a bunch of caveats.
Alvin, do you want to take up (2)?
-Saravana
Powered by blists - more mailing lists