lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 15 Apr 2023 13:43:36 +0100
From:   Daniel Golle <daniel@...rotopia.org>
To:     Arınç ÜNAL <arinc.unal@...nc9.com>
Cc:     Vladimir Oltean <olteanv@...il.com>,
        Frank Wunderlich <frank-w@...lic-files.de>,
        netdev <netdev@...r.kernel.org>, erkin.bozoglu@...ont.com
Subject: Re: mt7530: dsa_switch_parse_of() fails, causes probe code to run
 twice

On Sat, Apr 15, 2023 at 10:53:10AM +0300, Arınç ÜNAL wrote:
> On 15.04.2023 03:52, Daniel Golle wrote:
> > On Sat, Apr 15, 2023 at 03:28:55AM +0300, Arınç ÜNAL wrote:
> > > On 15.04.2023 02:53, Daniel Golle wrote:
> > > > On Sat, Apr 15, 2023 at 02:23:16AM +0300, Arınç ÜNAL wrote:
> > > > > On 15.04.2023 01:48, Daniel Golle wrote:
> > > > > > On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
> > > > > > > Hey there,
> > > > > > > 
> > > > > > > I've been working on the MT7530 DSA subdriver. While doing some tests, I
> > > > > > > realised mt7530_probe() runs twice. I moved enabling the regulators from
> > > > > > > mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
> > > > > > > with exception warnings on the first time. It works fine when
> > > > > > > mt7530_probe() is run again.
> > > > > > > 
> > > > > > > This should not be an expected behaviour, right? Any ideas how we can make
> > > > > > > it work the first time?
> > > > > > 
> > > > > > Can you share the patch or work-in-progress tree which will allow me
> > > > > > to reproduce this problem?
> > > > > 
> > > > > I tested this on vanilla 6.3-rc6. There's just the diff below that is
> > > > > applied. I encountered it on the standalone MT7530 on my Bananapi BPI-R2. I
> > > > > haven't tried it on MCM MT7530 on MT7621 SoC yet.
> > > > > 
> > > > > > 
> > > > > > It can of course be that regulator driver has not yet been loaded on
> > > > > > the first run and -EPROBE_DEFER is returned in that case. Knowing the
> > > > > > value of 'err' variable below would hence be valuable information.
> > > > > 
> > > > > Regardless of enabling the regulator on either mt7530_probe() or
> > > > > mt7530_setup(), dsa_switch_parse_of() always fails.
> > > > 
> > > > So dsa_switch_parse_of() can return -EPROBE_DEFER if the ethernet
> > > > driver responsible for the CPU port has not yet been loaded.
> > > > 
> > > > See net/dsa/dsa.c (inside function dsa_port_parse_of):
> > > > [...]
> > > > 1232)                master = of_find_net_device_by_node(ethernet);
> > > > 1233)                of_node_put(ethernet);
> > > > 1234)                if (!master)
> > > > 1235)                        return -EPROBE_DEFER;
> > > > [...]
> > > > 
> > > > Hence it would be important to include the value of 'err' in your
> > > > debugging printf output, as -EPROBE_DEFER can be an expected and
> > > > implicitely intended reality and nothing is wrong then.
> > > 
> > > Thanks Daniel. I can't do more tests soon but this is probably what's going
> > > on as the logs already indicate that the MediaTek ethernet driver was yet to
> > > load.
> > > 
> > > As acknowledged, since running the MT7530 DSA subdriver from scratch is
> > > expected if the ethernet driver is not loaded yet, there's not really a
> > > problem. Though the switch is reset twice in a short amount of time. I don't
> > > think that's very great.
> > 
> > That's true, and we should try to avoid that.
> > 
> > > 
> > > The driver initialisation seems serialised (at least for the drivers built
> > > into the kernel) as I tried sleeping for 5 seconds on mt7530_probe() but no
> > > other driver was loaded in the meantime so I got the same behaviour.
> > > 
> > > The regulator code will cause a long and nasty exception the first time.
> > > Though there's nothing wrong as it does what it's supposed to do on the
> > > second run. I'm not sure if that's negligible.
> > > 
> > > Could we at least somehow make the MT7530 DSA subdriver wait until the
> > > regulator driver is loaded?
> > 
> > I assume the regulator-related stackdump is unrelated, but caused by
> > cpufreq changes, which had now been fixed by commit 0883426fd07e
> > ("cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623").
> > 
> > If you are using v6.3-rc6 this commit is still missing there, but
> > manually picking it from linux-next should fix it.
> 
> I did one better and just did the test on the current linux-next, I get
> exceptions that seem to be identical. I also made sure this commit was
> actually there.
> 
> > 
> > Let me know if I can help with testing on my farm of MediaTek boards.
> > I'm a bit nervous about fixing MT7531BE soon, so deciding if we move
> > PLL activation to mt7530_probe() would be essential as it makes the
> > fix much easier...
> 
> Can you test this branch on MT7531AE, MT7531BE and the switch on MT7988 SoC?
> I just need to complete the patch logs, the code won't change much.
> 
> https://github.com/arinc9/linux/commits/for-netnext

Tested on BPi-R64 (MT7622A+MT7531BE), BPi-R3 (MT7986A+MT7531AE) and
MT7988A reference board. All working just fine.

For BPi-R2 (MT7623N+MT7530) I made sure to disable the already enabled
regulators in the error path and hence made the WARN_ON no longer
trigger, see:
https://github.com/dangowrt/linux/commit/55035b5ac739914166ed4f026262d0fc9b17bc76

> 
> I'm thinking if we can -EPROBE_DEFER right at the start of mt7530_probe(),
> it should prevent the reset code from running twice, and enabling the
> regulator will run without any exceptions.
> 
> I think I can just keep enabling the regulator on mt7530_setup() if I can't
> figure that out.

What's bad about having the hardware setup in mt7530_setup()? I think it
even makes more sense to have it there and *not* in mt7530_probe() for
exactly such reasons. Maybe we can even also move the reset function
there and really do *all* of the hardware setup there and let the probe
function really just parse DTS, probe the hardware, allocate memory and
initialize data-structures like it is supposed to be.

That being said, I thought that having PLL actication in mt7530_probe()
would make things easier (as in: not require a function pointer to the
sgmii_create function), but looking at it now this is not even true,
we now require
void mt7530_core_write(struct mt7530_priv *priv, u32 reg, u32 val);
void mt7530_core_set(struct mt7530_priv *priv, u32 reg, u32 val);
void mt7530_core_clear(struct mt7530_priv *priv, u32 reg, u32 val);
void mt7530_write(struct mt7530_priv *priv, u32 reg, u32 val);
u32 _mt7530_read(struct mt7530_dummy_poll *p);
u32 mt7530_read(struct mt7530_priv *priv, u32 reg);
being either exported or to be inline functions in mt7530.h which
previously wasn't needed...

I may miss something here and would like to understand your perspective:
What exactly is the argument for moving all of the setup to the probe
function?

> On MT7623NI, the switch stops working after 35 seconds with
> these logs. As long as the regulator is enabled before this, everything
> keeps working.
> 
> [   35.037200] vusb: disabling
> [   35.040089] vmc: disabling
> [   35.042856] vmch: disabling
> [   35.045709] vgp1: disabling
> [   35.049010] vcamaf: disabling
> 
> Arınç

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ