lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4d669474-59b6-b0e9-09cb-8278734fa3a2@gmail.com>
Date:   Fri, 17 Mar 2023 09:41:08 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Álvaro Fernández Rojas <noltari@...il.com>,
        Vladimir Oltean <olteanv@...il.com>
Cc:     davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
        pabeni@...hat.com, robh+dt@...nel.org,
        krzysztof.kozlowski+dt@...aro.org, jonas.gorski@...il.com,
        andrew@...n.ch, hkallweit1@...il.com, linux@...linux.org.uk,
        netdev@...r.kernel.org, devicetree@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] net: dsa: b53: mmap: register MDIO Mux bus controller

On 3/17/23 09:23, Álvaro Fernández Rojas wrote:
> El vie, 17 mar 2023 a las 15:29, Vladimir Oltean (<olteanv@...il.com>) escribió:
>>
>> On Fri, Mar 17, 2023 at 03:17:12PM +0100, Álvaro Fernández Rojas wrote:
>>>> The proposed solution is too radical for a problem that was not properly
>>>> characterized yet, so this patch set has my temporary NACK.
>>>
>>> Forgive me, but why do you consider this solution too radical?
>>
>> Because it involves changing device tree bindings (stable ABI) in an
>> incompatible way.
>>
>>>>
>>>>> But maybe Florian or Jonas can give some more details about the issue...
>>>>
>>>> I think you also have the tools necessary to investigate this further.
>>>> We need to know what resource belonging to the switch is it that the
>>>> MDIO mux needs. Where is the earliest place you can add the call to
>>>> b53_mmap_mdiomux_init() such that your board works reliably? Note that
>>>> b53_switch_register() indirectly calls b53_setup(). By placing this
>>>> function where you have, the entirety of b53_setup() has finished
>>>> execution, and we don't know exactly what is it from there that is
>>>> needed.
>>>
>>> In the following link you will find different bootlogs related to
>>> different scenarios all of them with the same result: any attempt of
>>> calling b53_mmap_mdiomux_init() earlier than b53_switch_register()
>>> will either result in a kernel panic or a device hang:
>>> https://gist.github.com/Noltari/b0bd6d5211160ac7bf349d998d21e7f7
>>>
>>> 1. before b53_switch_register():
>>>
>>> 2. before dsa_register_switch():
>>>
>>> 3. before b53_switch_init():
>>
>> Did you read what I said?
> 
> Yes, but I didn't get your point, sorry for that.
> 
>>
>> | Note that b53_switch_register() indirectly calls b53_setup(). By placing
>> | this function where you have, the entirety of b53_setup() has finished
>> | execution, and we don't know exactly what is it from there that is
>> | needed.
>>
>> Can you place the b53_mmap_mdiomux_init() in various places within
>> b53_setup() to restrict the search further?
> 
> I tried and these are the results:
> https://gist.github.com/Noltari/d5bdba66b8f2e392c9e4c2759661d862
> 
> All of them hang when dsa_tree_setup() is called for DSA tree 1
> (external switch) without having completely setup DSA tree 0 (internal
> switch):
> [ 1.471345] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0
> [ 1.481099] bcm6368-enetsw 1000d800.ethernet: IRQ tx not found
> [ 1.506752] bcm6368-enetsw 1000d800.ethernet: mtd mac 4c:60:de:86:52:12
> [ 1.594365] bcm7038-wdt 1000005c.watchdog: Registered BCM7038 Watchdog
> [ 1.612008] NET: Registered PF_INET6 protocol family
> [ 1.645617] Segment Routing with IPv6
> [ 1.649547] In-situ OAM (IOAM) with IPv6
> [ 1.653948] NET: Registered PF_PACKET protocol family
> [ 1.659984] 8021q: 802.1Q VLAN Support v1.8
> [ 1.699193] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0
> [ 2.124257] bcm53xx 0.1:1e: found switch: BCM53125, rev 4
> *** Device hang ***
> 
> I don't know if there's a way to defer the probe of DSA tree 1 (the
> external switch) until DSA tree 0 (the internal switch) is completely
> setup, because that would probably be the only alternative way of
> fixing this.

Could you find out which part is hanging? It looks like there is a busy 
waiting operation that we never complete?

DSA should be perfectly capable of dealing with disjoint trees being 
cascaded to one another, as this is entirely within how the framework is 
designed.

What I suspect might be happening is a "double programming" effect, 
similar or identical to what was described in this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b8c6cd1d316f3b01ae578d8e29179f6396c0eaa2

using the MDIO mux would properly isolate the pseudo PHYs of the switch 
such that a given MDIO write does not end up programming *both* the 
internal and external switches. It could also be a completely different 
problem.
-- 
Florian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ