lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 15 Jul 2022 02:58:04 -0400
From:   Da Xue <da@...sconfused.com>
To:     Jerome Brunet <jbrunet@...libre.com>
Cc:     Giuseppe Cavallaro <peppe.cavallaro@...com>,
        Alexandre Torgue <alexandre.torgue@...s.st.com>,
        Jose Abreu <joabreu@...opsys.com>,
        Erico Nunes <nunes.erico@...il.com>, netdev@...r.kernel.org,
        linux-amlogic@...ts.infradead.org,
        Kevin Hilman <khilman@...libre.com>,
        Neil Armstrong <narmstrong@...libre.com>,
        Vyacheslav <adeep@...ina.in>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Qi Duan <qi.duan@...ogic.com>
Subject: Re: [RFC/RFT PATCH] net: stmmac: do not poke MAC_CTRL_REG twice on
 link up

On Wed, Jul 13, 2022 at 5:24 AM Da Xue <da@...sconfused.com> wrote:
>
> On Thu, Jul 7, 2022 at 6:14 AM Jerome Brunet <jbrunet@...libre.com> wrote:
> >
> > For some reason, poking MAC_CTRL_REG a second time, even with the same
> > value, causes problem on a dwmac 3.70a.
> >
> > This problem happens on all the Amlogic SoCs, on link up, when the RMII
> > 10/100 internal interface is used. The problem does not happen on boards
> > using the external RGMII 10/100/1000 interface. Initially we suspected the
> > PHY to be the problem but after a lot of testing, the problem seems to be
> > coming from the MAC controller.
> >
> > > meson8b-dwmac c9410000.ethernet: IRQ eth_wake_irq not found
> > > meson8b-dwmac c9410000.ethernet: IRQ eth_lpi not found
> > > meson8b-dwmac c9410000.ethernet: PTP uses main clock
> > > meson8b-dwmac c9410000.ethernet: User ID: 0x11, Synopsys ID: 0x37
> > > meson8b-dwmac c9410000.ethernet:      DWMAC1000
> > > meson8b-dwmac c9410000.ethernet: DMA HW capability register supported
> > > meson8b-dwmac c9410000.ethernet: RX Checksum Offload Engine supported
> > > meson8b-dwmac c9410000.ethernet: COE Type 2
> > > meson8b-dwmac c9410000.ethernet: TX Checksum insertion supported
> > > meson8b-dwmac c9410000.ethernet: Wake-Up On Lan supported
> > > meson8b-dwmac c9410000.ethernet: Normal descriptors
> > > meson8b-dwmac c9410000.ethernet: Ring mode enabled
> > > meson8b-dwmac c9410000.ethernet: Enable RX Mitigation via HW Watchdog Timer
> >
> > The problem is not systematic. Its occurence is very random from 1/50 to
> > 1/2. It is fairly easy to detect by setting the kernel to boot over NFS and
> > possibly setting it to reboot automatically when reaching the prompt.
> >
> > When problem happens, the link is reported up by the PHY but no packet are
> > actually going out. DHCP requests eventually times out and the kernel reset
> > the interface. It may take several attempts but it will eventually work.
> >
> > > meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> > > Sending DHCP requests ...... timed out!
> > > meson8b-dwmac ff3f0000.ethernet eth0: Link is Down
> > > IP-Config: Retrying forever (NFS root)...
> > > meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.1:08] driver [Meson G12A Internal PHY] (irq=POLL)
> > > meson8b-dwmac ff3f0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
> > > meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
> > > meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
> > > meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rmii link mode
> > > meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> > > Sending DHCP requests ...... timed out!
> > > meson8b-dwmac ff3f0000.ethernet eth0: Link is Down
> > > IP-Config: Retrying forever (NFS root)...
> > > [...] 5 retries ...
> > > IP-Config: Retrying forever (NFS root)...
> > > meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.1:08] driver [Meson G12A Internal PHY] (irq=POLL)
> > > meson8b-dwmac ff3f0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
> > > meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
> > > meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
> > > meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rmii link mode
> > > meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> > > Sending DHCP requests ., OK
> > > IP-Config: Got DHCP answer from 10.1.1.1, my address is 10.1.3.229
> >
> > Of course the same problem happens when not using NFS and it fairly
> > difficult for IoT products to detect this situation and recover.
> >
> > The call to stmmac_mac_set() should be no-op in our case, the bits it sets
> > have already been set by an earlier call to stmmac_mac_set(). However
> > removing this call solves the problem. We have no idea why or what is the
> > actual problem.
> >
> > Even weirder, keeping the call to stmmac_mac_set() but inserting a
> > udelay(1) between writel() and stmmac_mac_set() solves the problem too.
> >
> > Suggested-by: Qi Duan <qi.duan@...ogic.com>
> > Signed-off-by: Jerome Brunet <jbrunet@...libre.com>
> > ---
> >
> >  Hi,
> >
> >  There is no intention to get this patch merged as it is.
> >  It is sent with the hope to get a better understanding of the issue
> >  and more testing.
> >
> >  The discussion on this issue initially started on this thread
> >  https://lore.kernel.org/all/CAK4VdL3-BEBzgVXTMejrAmDjOorvoGDBZ14UFrDrKxVEMD2Zjg@mail.gmail.com/
> >
> >  The patches previously proposed in this thread have not solved the
> >  problem.
> >
> >  The line removed in this patch should be a no-op when it comes to the
> >  value of MAC_CTRL_REG. So the change should make not a difference but
> >  it does. Testing result have been very good so far so there must be an
> >  unexpected consequence on the HW. I hope that someone with more
> >  knowledge on this controller will be able to shine some light on this.
> >
> >  Cheers
> >  Jerome
> >
> >  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index d1a7cf4567bc..3dca3cc61f39 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -1072,7 +1072,6 @@ static void stmmac_mac_link_up(struct phylink_config *config,
> >
> >         writel(ctrl, priv->ioaddr + MAC_CTRL_REG);
> >
> > -       stmmac_mac_set(priv, priv->ioaddr, true);
> >         if (phy && priv->dma_cap.eee) {
> >                 priv->eee_active = phy_init_eee(phy, 1) >= 0;
> >                 priv->eee_enabled = stmmac_eee_init(priv);
> > --
> > 2.36.1
> >
>
> We had a problem with GXL (S805X/S905X) where the ethernet interface
> would sometimes not come up. Before the 5.10 LTS, it was just a matter
> of bringing down and up (ip link set) the interface to fix the issue.
> With 5.15, 5.18, and 5.19, we would get "meson8b-dwmac
> c9410000.ethernet eth0: Reset adapter." No amount of link down ups can
> fix it anymore.

I realized that I did not add the ethernet reset in the device tree
that u-boot was passing to Linux. Sorry about the noise on this.

>
> When we get the "meson8b-dwmac c9410000.ethernet eth0: Reset
> adapter.", it affects traffic on the network switch. I have a ping
> going from two different devices on a GS108PP PoE network switch and
> it would go through the roof. When I remove the GXL board, everything
> comes back to normal.

Given that the reset fixes the ethernet issues, the hardware still
could be causing this but it is no longer long enough to notice.

>
> We would get randomized corruption when ethernet is brought up
> (successfully or not) about half the time. If it boots up without a
> problem, it remains super stable. I would run benchmarks for CPU, 3D,
> and ethernet for days without that glitch ever appearing. It seems to
> be determined at startup.

This is gone with ethernet reset in the device tree and the no
double-poke register change Jerome provided.

Best,
Da

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ