lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 15 Apr 2023 16:19:46 +0300
From:   Arınç ÜNAL <arinc.unal@...nc9.com>
To:     Daniel Golle <daniel@...rotopia.org>
Cc:     Vladimir Oltean <olteanv@...il.com>,
        Frank Wunderlich <frank-w@...lic-files.de>,
        netdev <netdev@...r.kernel.org>, erkin.bozoglu@...ont.com,
        Thibaut <hacks@...shdirt.org>
Subject: Re: mt7530: dsa_switch_parse_of() fails, causes probe code to run
 twice

On 15.04.2023 15:43, Daniel Golle wrote:
> On Sat, Apr 15, 2023 at 10:53:10AM +0300, Arınç ÜNAL wrote:
>> On 15.04.2023 03:52, Daniel Golle wrote:
>>> On Sat, Apr 15, 2023 at 03:28:55AM +0300, Arınç ÜNAL wrote:
>>>> On 15.04.2023 02:53, Daniel Golle wrote:
>>>>> On Sat, Apr 15, 2023 at 02:23:16AM +0300, Arınç ÜNAL wrote:
>>>>>> On 15.04.2023 01:48, Daniel Golle wrote:
>>>>>>> On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
>>>>>>>> Hey there,
>>>>>>>>
>>>>>>>> I've been working on the MT7530 DSA subdriver. While doing some tests, I
>>>>>>>> realised mt7530_probe() runs twice. I moved enabling the regulators from
>>>>>>>> mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
>>>>>>>> with exception warnings on the first time. It works fine when
>>>>>>>> mt7530_probe() is run again.
>>>>>>>>
>>>>>>>> This should not be an expected behaviour, right? Any ideas how we can make
>>>>>>>> it work the first time?
>>>>>>>
>>>>>>> Can you share the patch or work-in-progress tree which will allow me
>>>>>>> to reproduce this problem?
>>>>>>
>>>>>> I tested this on vanilla 6.3-rc6. There's just the diff below that is
>>>>>> applied. I encountered it on the standalone MT7530 on my Bananapi BPI-R2. I
>>>>>> haven't tried it on MCM MT7530 on MT7621 SoC yet.
>>>>>>
>>>>>>>
>>>>>>> It can of course be that regulator driver has not yet been loaded on
>>>>>>> the first run and -EPROBE_DEFER is returned in that case. Knowing the
>>>>>>> value of 'err' variable below would hence be valuable information.
>>>>>>
>>>>>> Regardless of enabling the regulator on either mt7530_probe() or
>>>>>> mt7530_setup(), dsa_switch_parse_of() always fails.
>>>>>
>>>>> So dsa_switch_parse_of() can return -EPROBE_DEFER if the ethernet
>>>>> driver responsible for the CPU port has not yet been loaded.
>>>>>
>>>>> See net/dsa/dsa.c (inside function dsa_port_parse_of):
>>>>> [...]
>>>>> 1232)                master = of_find_net_device_by_node(ethernet);
>>>>> 1233)                of_node_put(ethernet);
>>>>> 1234)                if (!master)
>>>>> 1235)                        return -EPROBE_DEFER;
>>>>> [...]
>>>>>
>>>>> Hence it would be important to include the value of 'err' in your
>>>>> debugging printf output, as -EPROBE_DEFER can be an expected and
>>>>> implicitely intended reality and nothing is wrong then.
>>>>
>>>> Thanks Daniel. I can't do more tests soon but this is probably what's going
>>>> on as the logs already indicate that the MediaTek ethernet driver was yet to
>>>> load.
>>>>
>>>> As acknowledged, since running the MT7530 DSA subdriver from scratch is
>>>> expected if the ethernet driver is not loaded yet, there's not really a
>>>> problem. Though the switch is reset twice in a short amount of time. I don't
>>>> think that's very great.
>>>
>>> That's true, and we should try to avoid that.
>>>
>>>>
>>>> The driver initialisation seems serialised (at least for the drivers built
>>>> into the kernel) as I tried sleeping for 5 seconds on mt7530_probe() but no
>>>> other driver was loaded in the meantime so I got the same behaviour.
>>>>
>>>> The regulator code will cause a long and nasty exception the first time.
>>>> Though there's nothing wrong as it does what it's supposed to do on the
>>>> second run. I'm not sure if that's negligible.
>>>>
>>>> Could we at least somehow make the MT7530 DSA subdriver wait until the
>>>> regulator driver is loaded?
>>>
>>> I assume the regulator-related stackdump is unrelated, but caused by
>>> cpufreq changes, which had now been fixed by commit 0883426fd07e
>>> ("cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623").
>>>
>>> If you are using v6.3-rc6 this commit is still missing there, but
>>> manually picking it from linux-next should fix it.
>>
>> I did one better and just did the test on the current linux-next, I get
>> exceptions that seem to be identical. I also made sure this commit was
>> actually there.
>>
>>>
>>> Let me know if I can help with testing on my farm of MediaTek boards.
>>> I'm a bit nervous about fixing MT7531BE soon, so deciding if we move
>>> PLL activation to mt7530_probe() would be essential as it makes the
>>> fix much easier...
>>
>> Can you test this branch on MT7531AE, MT7531BE and the switch on MT7988 SoC?
>> I just need to complete the patch logs, the code won't change much.
>>
>> https://github.com/arinc9/linux/commits/for-netnext
> 
> Tested on BPi-R64 (MT7622A+MT7531BE), BPi-R3 (MT7986A+MT7531AE) and
> MT7988A reference board. All working just fine.
> 
> For BPi-R2 (MT7623N+MT7530) I made sure to disable the already enabled
> regulators in the error path and hence made the WARN_ON no longer
> trigger, see:
> https://github.com/dangowrt/linux/commit/55035b5ac739914166ed4f026262d0fc9b17bc76

Very nice. This is also what I had figured eventually and was in the 
process of conveying to Vladimir. I can confirm I don't get exceptions 
anymore.

> 
>>
>> I'm thinking if we can -EPROBE_DEFER right at the start of mt7530_probe(),
>> it should prevent the reset code from running twice, and enabling the
>> regulator will run without any exceptions.
>>
>> I think I can just keep enabling the regulator on mt7530_setup() if I can't
>> figure that out.
> 
> What's bad about having the hardware setup in mt7530_setup()? I think it
> even makes more sense to have it there and *not* in mt7530_probe() for
> exactly such reasons. Maybe we can even also move the reset function
> there and really do *all* of the hardware setup there and let the probe
> function really just parse DTS, probe the hardware, allocate memory and
> initialize data-structures like it is supposed to be.
> 
> That being said, I thought that having PLL actication in mt7530_probe()
> would make things easier (as in: not require a function pointer to the
> sgmii_create function), but looking at it now this is not even true,
> we now require
> void mt7530_core_write(struct mt7530_priv *priv, u32 reg, u32 val);
> void mt7530_core_set(struct mt7530_priv *priv, u32 reg, u32 val);
> void mt7530_core_clear(struct mt7530_priv *priv, u32 reg, u32 val);
> void mt7530_write(struct mt7530_priv *priv, u32 reg, u32 val);
> u32 _mt7530_read(struct mt7530_dummy_poll *p);
> u32 mt7530_read(struct mt7530_priv *priv, u32 reg);
> being either exported or to be inline functions in mt7530.h which
> previously wasn't needed...
> 
> I may miss something here and would like to understand your perspective:
> What exactly is the argument for moving all of the setup to the probe
> function?

While speaking here, let's discuss what should be considered probing. To 
me, detecting the chip ID, checking the XTAL frequency, checking whether 
port 5 is SGMII, are probe material. We retrieve information from the 
hardware and reject registering the switch if things don't fit.

The current order of the code implies that resetting the switch is 
necessary for these checks to be done.

As an example, on realtek-mdio.c [0], I can also see that reset is done 
on probe.

Now anything after that, like setting down MACs, PHYs, doing internal 
reset, pll setup, creating sgmii, lowering driving could rather be on 
mt7530_setup().

One thing that complicates this is that the MT7530 switch has got a 
unique feature, PHY muxing. I want to be able to use this feature 
without registering the switch at all. And that requires the switch to 
be at least reset.

This is especially useful for devices that only use a single port of the 
switch. We can get rid of the DSA header overhead by doing this. I 
explained this in more detail here [1].

I'm CC'ing Thiabut here since they've been interested in this for a while.

[0] 
https://github.com/arinc9/linux/blob/for-netnext/drivers/net/dsa/realtek/realtek-mdio.c#L143
[1] 
https://lore.kernel.org/netdev/0e3ca573-2190-57b0-0e98-7f5b890d328e@arinc9.com/

Arınç

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ