lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 07 Feb 2022 11:41:36 +0100 From: Jerome Brunet <jbrunet@...libre.com> To: Erico Nunes <nunes.erico@...il.com>, Alexandre Torgue <alexandre.torgue@...s.st.com>, Giuseppe Cavallaro <peppe.cavallaro@...com>, Jose Abreu <joabreu@...opsys.com>, Kevin Hilman <khilman@...libre.com>, Martin Blumenstingl <martin.blumenstingl@...glemail.com>, Neil Armstrong <narmstrong@...libre.com>, linux-amlogic@...ts.infradead.org, netdev@...r.kernel.org, linux-rockchip@...ts.infradead.org, linux-sunxi@...ts.linux.dev Subject: Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@...il.com> wrote: > Hello, > > I've been tracking down an issue with network interfaces from > meson8b-dwmac sometimes not coming up properly at boot. > The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have > a group of them as part of a CI test farm that uses nfsroot. > > After hopefully ruling out potential platform/firmware and network > issues I managed to bisect this commit in the kernel to make a big > difference: > > 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved > link config in mac_link_up() > > With a kernel before that commit, I am able to submit hundreds of test > jobs and the boards always start the network interface properly. > > After that commit, around 30% of the jobs start hitting this: > > [ 2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY > [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48) > [ 2.183505] meson8b-dwmac c9410000.ethernet eth0: Register > MEM_TYPE_PAGE_POOL RxQ-0 > [ 2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety > Features support found > [ 2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW > [ 2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for > phy/rmii link mode > [ 3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up - > 100Mbps/Full - flow control off > [ 3.783162] Sending DHCP requests ...... timed out! > [ 93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down > [ 93.685712] IP-Config: Retrying forever (NFS root)... > [ 93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY > [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48) > [ 93.763266] meson8b-dwmac c9410000.ethernet eth0: Register > MEM_TYPE_PAGE_POOL RxQ-0 > [ 93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety > Features support found > [ 93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW > [ 93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for > phy/rmii link mode > [ 93.807459] random: fast init done > [ 95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up - > 100Mbps/Full - flow control off > > This still happens with a kernel from master, currently 5.17-rc2 (less > frequently but still often hit by CI test jobs). > The jobs still usually get to work after restarting the interface a > couple of times, but sometimes it takes 3-4 attempts. > > Here is one example and full dmesg: > https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw > > Note that DHCP does not seem to be an issue here, besides the fact > that the problem only happens since the mentioned commit under the > same setup, I did try to set up the boards to use a static ip but then > the interfaces just don't communicate at all from boot. > > For test purposes I attempted to revert > 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that > does not apply trivially anymore, and by trying to revert it manually > I haven't been able to get a working interface. > > Any advice on how to further debug or fix this? Hi Erico, Thanks a lot for digging into this topic. I'm seeing exactly the same behavior on the g12 based khadas-vim3: * Boot stalled waiting for DHCP - with an NFS based filesystem * Every minute, the network driver gets a reset and try again Sometimes it works on the first attempt, sometimes it takes up to 5 attempts. Eventually, it reaches the prompt which might be why it went unnoticed so far. I think that NFS just makes the problem easier to see. On devices with an eMMC based filesystem, I noticed that, sometimes, I had unplug/plug the ethernet cable to make it go. So far, the problem is reported on all the Amlogic SoC generation we support. I think a way forward is to ask the the other users of stmmac whether they have this problem or not - adding Allwinner and Rockchip ML. Since the commit you have identified is in the generic part of the stmmac code, Maybe Jose can help us understand what is going on. > > Thanks > > Erico
Powered by blists - more mailing lists