lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e8a9f73-7b30-4bcb-0cf5-bd124e7a147e@bang-olufsen.dk>
Date:   Wed, 25 Aug 2021 13:40:06 +0000
From:   Alvin Šipraga <ALSI@...g-olufsen.dk>
To:     Saravana Kannan <saravanak@...gle.com>
CC:     Vladimir Oltean <olteanv@...il.com>,
        Vladimir Oltean <vladimir.oltean@....com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Frank Rowand <frowand.list@...il.com>,
        Rob Herring <robh+dt@...nel.org>
Subject: Re: [PATCH net] net: dsa: sja1105: fix use-after-free after calling
 of_find_compatible_node, or worse

Hi Saravana,

Sorry for the delayed response.

On 8/23/21 8:50 PM, Saravana Kannan wrote:
> On Sun, Aug 22, 2021 at 7:19 AM Alvin Šipraga <ALSI@...g-olufsen.dk> wrote:
>>
>> Hi Saravana,
>>
>> Thanks for the follow-up. I tested your change and it does the trick:
>> there is no deferral and the PHY driver gets probed first-try during the
>> mdiobus registration during the call to dsa_register_switch().
> 
> I'm fairly certain the mdiobus registration happens before
> dsa_register_switch(). It's in the probe call path of the DSA. The
> connecting of the PHYs with the DSA is what happens when
> dsa_register_switch() is called.

Hm, are you sure about this? My understanding is that mdiobus 
registration happens as follows:

dsa_register_switch()
     -> ...
         -> rtl836{6,5mb}_setup() # see [1] for RFC patch with rtl8365mb
             -> realtek_smi_setup_mdio()
                 -> of_mdiobus_register()

As it stands, dsa_register_switch() is currently called from 
realtek_smi_probe(). Your patch just moves this call to 
realtek_smi_sync_state(), but per the above, the mdiobus registration 
happens inside dsa_register_switch(), meaning it doesn't happen in the 
probe call path. Or am I missing something? I'm happy to be wrong :-)

[1] https://lore.kernel.org/netdev/20210822193145.1312668-1-alvin@pqrs.dk/T/

> 
>> I tested
>> with the switch, PHY, and tagging drivers all builtin, or all modules,
>> and it worked in both cases.
>>
>> On 8/20/21 6:52 PM, Saravana Kannan wrote:
>>> Hi Alvin,
>>>
>>> Can you give this a shot to see if it fixes your issue? It basically
>>> delays the registration of dsa_register_switch() until all the
>>> consumers of this switch have probed. So it has a couple of caveats:
>>
>> Hm, weren't the only consumers the PHYs themselves? It seems like the
>> main effect of your change is that - by doing the actual
>> dsa_register_switch() call after the switch driver probe - the
>> ethernet-switch (provider) is already probed, thereby allowing the PHY
>> (consumer) to probe immediately.
> 
> Correct-ish -- if you modify this to account for what I said above.
> 
>>
>>> 1. I'm hoping the PHYs are the only consumers of this switch.
>>
>> In my case that is true, if you count the mdio_bus as well:
>>
>> /sys/devices/platform/ethernet-switch# ls -l consumer\:*
>> lrwxrwxrwx    1 root     root             0 Aug 22 16:00
>> consumer:mdio_bus:SMI-0 ->
>> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0
>> lrwxrwxrwx    1 root     root             0 Aug 22 16:00
>> consumer:mdio_bus:SMI-0:00 ->
>> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:00
>> lrwxrwxrwx    1 root     root             0 Aug 22 16:00
>> consumer:mdio_bus:SMI-0:01 ->
>> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:01
>> lrwxrwxrwx    1 root     root             0 Aug 22 16:00
>> consumer:mdio_bus:SMI-0:02 ->
>> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:02
>> lrwxrwxrwx    1 root     root             0 Aug 22 16:00
>> consumer:mdio_bus:SMI-0:03 ->
>> ../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:03
> 
> Hmm... mdio_bus being a consumer should prevent the sync_state() from
> being called on "ethernet-switch". What's the value of the "status"
> and "sync_state_only" files inside that mdio_bus folder?

Without your patch:

/sys/devices/platform/ethernet-switch# ls -l consumer\:*
lrwxrwxrwx    1 root     root             0 Aug 25 13:42 
consumer:mdio_bus:SMI-0 -> 
../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0
/sys/devices/platform/ethernet-switch# cat consumer\:*/status
available
/sys/devices/platform/ethernet-switch# cat consumer\:*/sync_state_only
1


With your patch:

0.0.0.0@...4-:/sys/devices/platform/ethernet-switch# ls -l consumer\:*
lrwxrwxrwx    1 root     root             0 Aug 25 15:03 
consumer:mdio_bus:SMI-0 -> 
../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0
lrwxrwxrwx    1 root     root             0 Aug 25 15:03 
consumer:mdio_bus:SMI-0:00 -> 
../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:00
lrwxrwxrwx    1 root     root             0 Aug 25 15:03 
consumer:mdio_bus:SMI-0:01 -> 
../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:01
lrwxrwxrwx    1 root     root             0 Aug 25 15:03 
consumer:mdio_bus:SMI-0:02 -> 
../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:02
lrwxrwxrwx    1 root     root             0 Aug 25 15:03 
consumer:mdio_bus:SMI-0:03 -> 
../../virtual/devlink/platform:ethernet-switch--mdio_bus:SMI-0:03
0.0.0.0@...4-:/sys/devices/platform/ethernet-switch#
0.0.0.0@...4-:/sys/devices/platform/ethernet-switch#
0.0.0.0@...4-:/sys/devices/platform/ethernet-switch# cat 
consumer\:*/status
available
active
active
active
active
0.0.0.0@...4-:/sys/devices/platform/ethernet-switch# cat 
consumer\:*/sync_state_only
1
0
0
0
0

Hope that helps you understand what's going on better.

BTW, I noticed that when I build realtek-smi as a module with your 
patch, my kernel crashes if I unload the module. I haven't debugged this 
because it was just a test, nor did I get a stacktrace. LMK if you want 
more info.

> 
>>> 2. All of them have to probe successfully before the switch will
>>> register itself.
>>
>> Yes.
> 
> Right, it's a yes in your case. But will it be a yes for all instances
> of "realtek,rtl8366rb"?
> 
>>> 3. If dsa_register_switch() fails, we can't defer the probe (because
>>> it already succeeded). But I'm not sure if it's a likely error code.
>>
>> It's of course possible that dsa_register_switch() fails. Assuming
>> fw_devlink is doing its job properly, I think the reason is most likely
>> going to be something specific to the driver, such as a communication
>> timeout with the switch hardware itself.
> 
> But what if someone sets fw_devlink=permissive? Is it okay to break
> this? There are ways to make this work for fw_devlink=permissive and
> =on -- you check for each and decide where to call
> dsa_register_switch() based on that.

I am new to DSA myself so I think I am unqualified to answer whether 
it's OK to break things or not.

> 
>> I get the impression that you don't necessarily regard this change as a
>> proper fix, so I'm happy to do further tests if you choose to
>> investigate further.
> 
> I thought about this in the background the past few days. I think
> there are a couple of options:
> 1. We (community/Andrew) agree that this driver would only work with
> fw_devlink=on and we can confirm that the other upstream uses of
> "realtek,rtl8366rb" won't have any unprobed consumers problem and
> switch to using my patch. Benefit is that it's a trivial and quick
> change that gets things working again.
> 2. The "realtek,rtl8366rb" driver needs to be fixed to use a
> "component device". A component device is a logical device that
> represents a group of other devices. It's only initialized after all
> these devices have probed successfully. The actual switch should be a
> component device and it should call dsa_register_switch() in it's
> "bind" (equivalent of probe). That way you can explicitly control what
> devices need to be probed instead of depending on sync_state() that
> have a bunch of caveats.
> 
> Alvin, do you want to take up (2)?

I can give it a shot, but first:

   - It seems Andrew may also need some convincing that this is the 
right approach.
   - Are you sure that this will solve the problem? See what I wrote 
upstairs in this email.
   - I have never written - nor can I recall reading - a component 
driver, so I would appreciate if you could point me to an upstream 
example that you think is illustrative.

There's one more issue: I do not have an RTL8366 to test on - rather, I 
encountered this problem in realtek-smi while writing a new subdriver 
for it to support the RTL8365MB. So any proposed fix may be perceived as 
speculative if I cannot test on the '66 hardware. In that case this may 
have to wait until the '65MB subdriver is accepted. Many ifs and maybes, 
but don't take it to mean I'm not interested in helping. On the contrary 
- I would like to fix this bug since I am affected by it!

Kind regards,
Alvin

> 
> -Saravana
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ