lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 23 Aug 2021 11:50:50 -0700
From:   Saravana Kannan <saravanak@...gle.com>
To:     Alvin Šipraga <ALSI@...g-olufsen.dk>
Cc:     Vladimir Oltean <olteanv@...il.com>,
        Vladimir Oltean <vladimir.oltean@....com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Frank Rowand <frowand.list@...il.com>,
        Rob Herring <robh+dt@...nel.org>
Subject: Re: [PATCH net] net: dsa: sja1105: fix use-after-free after calling
 of_find_compatible_node, or worse

On Sun, Aug 22, 2021 at 7:19 AM Alvin Šipraga <ALSI@...g-olufsen.dk> wrote:
>
> Hi Saravana,
>
> Thanks for the follow-up. I tested your change and it does the trick:
> there is no deferral and the PHY driver gets probed first-try during the
> mdiobus registration during the call to dsa_register_switch().

I'm fairly certain the mdiobus registration happens before
dsa_register_switch(). It's in the probe call path of the DSA. The
connecting of the PHYs with the DSA is what happens when
dsa_register_switch() is called.

> I tested
> with the switch, PHY, and tagging drivers all builtin, or all modules,
> and it worked in both cases.
>
> On 8/20/21 6:52 PM, Saravana Kannan wrote:
> > Hi Alvin,
> >
> > Can you give this a shot to see if it fixes your issue? It basically
> > delays the registration of dsa_register_switch() until all the
> > consumers of this switch have probed. So it has a couple of caveats:
>
> Hm, weren't the only consumers the PHYs themselves? It seems like the
> main effect of your change is that - by doing the actual
> dsa_register_switch() call after the switch driver probe - the
> ethernet-switch (provider) is already probed, thereby allowing the PHY
> (consumer) to probe immediately.

Correct-ish -- if you modify this to account for what I said above.

>
> > 1. I'm hoping the PHYs are the only consumers of this switch.
>
> In my case that is true, if you count the mdio_bus as well:
>
> /sys/devices/platform/ethernet-switch# ls -l consumer\:*
> lrwxrwxrwx    1 root     root             0 Aug 22 16:00
> consumer:mdio_bus:SMI-0 ->
> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0
> lrwxrwxrwx    1 root     root             0 Aug 22 16:00
> consumer:mdio_bus:SMI-0:00 ->
> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:00
> lrwxrwxrwx    1 root     root             0 Aug 22 16:00
> consumer:mdio_bus:SMI-0:01 ->
> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:01
> lrwxrwxrwx    1 root     root             0 Aug 22 16:00
> consumer:mdio_bus:SMI-0:02 ->
> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:02
> lrwxrwxrwx    1 root     root             0 Aug 22 16:00
> consumer:mdio_bus:SMI-0:03 ->
> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:03

Hmm... mdio_bus being a consumer should prevent the sync_state() from
being called on "ethernet-switch". What's the value of the "status"
and "sync_state_only" files inside that mdio_bus folder?

> > 2. All of them have to probe successfully before the switch will
> > register itself.
>
> Yes.

Right, it's a yes in your case. But will it be a yes for all instances
of "realtek,rtl8366rb"?

> > 3. If dsa_register_switch() fails, we can't defer the probe (because
> > it already succeeded). But I'm not sure if it's a likely error code.
>
> It's of course possible that dsa_register_switch() fails. Assuming
> fw_devlink is doing its job properly, I think the reason is most likely
> going to be something specific to the driver, such as a communication
> timeout with the switch hardware itself.

But what if someone sets fw_devlink=permissive? Is it okay to break
this? There are ways to make this work for fw_devlink=permissive and
=on -- you check for each and decide where to call
dsa_register_switch() based on that.

> I get the impression that you don't necessarily regard this change as a
> proper fix, so I'm happy to do further tests if you choose to
> investigate further.

I thought about this in the background the past few days. I think
there are a couple of options:
1. We (community/Andrew) agree that this driver would only work with
fw_devlink=on and we can confirm that the other upstream uses of
"realtek,rtl8366rb" won't have any unprobed consumers problem and
switch to using my patch. Benefit is that it's a trivial and quick
change that gets things working again.
2. The "realtek,rtl8366rb" driver needs to be fixed to use a
"component device". A component device is a logical device that
represents a group of other devices. It's only initialized after all
these devices have probed successfully. The actual switch should be a
component device and it should call dsa_register_switch() in it's
"bind" (equivalent of probe). That way you can explicitly control what
devices need to be probed instead of depending on sync_state() that
have a bunch of caveats.

Alvin, do you want to take up (2)?

-Saravana

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ