[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACdvmAjz9EzeapjATa0aOnRURYgS7NSZRE=uUAuc4X+2otG5EA@mail.gmail.com>
Date: Wed, 13 Jul 2022 05:24:52 -0400
From: Da Xue <da@...sconfused.com>
To: Jerome Brunet <jbrunet@...libre.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@...com>,
Alexandre Torgue <alexandre.torgue@...s.st.com>,
Jose Abreu <joabreu@...opsys.com>,
Erico Nunes <nunes.erico@...il.com>, netdev@...r.kernel.org,
linux-amlogic@...ts.infradead.org,
Kevin Hilman <khilman@...libre.com>,
Neil Armstrong <narmstrong@...libre.com>,
Vyacheslav <adeep@...ina.in>,
Heiner Kallweit <hkallweit1@...il.com>,
Qi Duan <qi.duan@...ogic.com>
Subject: Re: [RFC/RFT PATCH] net: stmmac: do not poke MAC_CTRL_REG twice on
link up
On Thu, Jul 7, 2022 at 6:14 AM Jerome Brunet <jbrunet@...libre.com> wrote:
>
> For some reason, poking MAC_CTRL_REG a second time, even with the same
> value, causes problem on a dwmac 3.70a.
>
> This problem happens on all the Amlogic SoCs, on link up, when the RMII
> 10/100 internal interface is used. The problem does not happen on boards
> using the external RGMII 10/100/1000 interface. Initially we suspected the
> PHY to be the problem but after a lot of testing, the problem seems to be
> coming from the MAC controller.
>
> > meson8b-dwmac c9410000.ethernet: IRQ eth_wake_irq not found
> > meson8b-dwmac c9410000.ethernet: IRQ eth_lpi not found
> > meson8b-dwmac c9410000.ethernet: PTP uses main clock
> > meson8b-dwmac c9410000.ethernet: User ID: 0x11, Synopsys ID: 0x37
> > meson8b-dwmac c9410000.ethernet: DWMAC1000
> > meson8b-dwmac c9410000.ethernet: DMA HW capability register supported
> > meson8b-dwmac c9410000.ethernet: RX Checksum Offload Engine supported
> > meson8b-dwmac c9410000.ethernet: COE Type 2
> > meson8b-dwmac c9410000.ethernet: TX Checksum insertion supported
> > meson8b-dwmac c9410000.ethernet: Wake-Up On Lan supported
> > meson8b-dwmac c9410000.ethernet: Normal descriptors
> > meson8b-dwmac c9410000.ethernet: Ring mode enabled
> > meson8b-dwmac c9410000.ethernet: Enable RX Mitigation via HW Watchdog Timer
>
> The problem is not systematic. Its occurence is very random from 1/50 to
> 1/2. It is fairly easy to detect by setting the kernel to boot over NFS and
> possibly setting it to reboot automatically when reaching the prompt.
>
> When problem happens, the link is reported up by the PHY but no packet are
> actually going out. DHCP requests eventually times out and the kernel reset
> the interface. It may take several attempts but it will eventually work.
>
> > meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> > Sending DHCP requests ...... timed out!
> > meson8b-dwmac ff3f0000.ethernet eth0: Link is Down
> > IP-Config: Retrying forever (NFS root)...
> > meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.1:08] driver [Meson G12A Internal PHY] (irq=POLL)
> > meson8b-dwmac ff3f0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
> > meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
> > meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
> > meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rmii link mode
> > meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> > Sending DHCP requests ...... timed out!
> > meson8b-dwmac ff3f0000.ethernet eth0: Link is Down
> > IP-Config: Retrying forever (NFS root)...
> > [...] 5 retries ...
> > IP-Config: Retrying forever (NFS root)...
> > meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.1:08] driver [Meson G12A Internal PHY] (irq=POLL)
> > meson8b-dwmac ff3f0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
> > meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
> > meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
> > meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rmii link mode
> > meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> > Sending DHCP requests ., OK
> > IP-Config: Got DHCP answer from 10.1.1.1, my address is 10.1.3.229
>
> Of course the same problem happens when not using NFS and it fairly
> difficult for IoT products to detect this situation and recover.
>
> The call to stmmac_mac_set() should be no-op in our case, the bits it sets
> have already been set by an earlier call to stmmac_mac_set(). However
> removing this call solves the problem. We have no idea why or what is the
> actual problem.
>
> Even weirder, keeping the call to stmmac_mac_set() but inserting a
> udelay(1) between writel() and stmmac_mac_set() solves the problem too.
>
> Suggested-by: Qi Duan <qi.duan@...ogic.com>
> Signed-off-by: Jerome Brunet <jbrunet@...libre.com>
> ---
>
> Hi,
>
> There is no intention to get this patch merged as it is.
> It is sent with the hope to get a better understanding of the issue
> and more testing.
>
> The discussion on this issue initially started on this thread
> https://lore.kernel.org/all/CAK4VdL3-BEBzgVXTMejrAmDjOorvoGDBZ14UFrDrKxVEMD2Zjg@mail.gmail.com/
>
> The patches previously proposed in this thread have not solved the
> problem.
>
> The line removed in this patch should be a no-op when it comes to the
> value of MAC_CTRL_REG. So the change should make not a difference but
> it does. Testing result have been very good so far so there must be an
> unexpected consequence on the HW. I hope that someone with more
> knowledge on this controller will be able to shine some light on this.
>
> Cheers
> Jerome
>
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index d1a7cf4567bc..3dca3cc61f39 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1072,7 +1072,6 @@ static void stmmac_mac_link_up(struct phylink_config *config,
>
> writel(ctrl, priv->ioaddr + MAC_CTRL_REG);
>
> - stmmac_mac_set(priv, priv->ioaddr, true);
> if (phy && priv->dma_cap.eee) {
> priv->eee_active = phy_init_eee(phy, 1) >= 0;
> priv->eee_enabled = stmmac_eee_init(priv);
> --
> 2.36.1
>
We had a problem with GXL (S805X/S905X) where the ethernet interface
would sometimes not come up. Before the 5.10 LTS, it was just a matter
of bringing down and up (ip link set) the interface to fix the issue.
With 5.15, 5.18, and 5.19, we would get "meson8b-dwmac
c9410000.ethernet eth0: Reset adapter." No amount of link down ups can
fix it anymore.
When we get the "meson8b-dwmac c9410000.ethernet eth0: Reset
adapter.", it affects traffic on the network switch. I have a ping
going from two different devices on a GS108PP PoE network switch and
it would go through the roof. When I remove the GXL board, everything
comes back to normal.
We would get randomized corruption when ethernet is brought up
(successfully or not) about half the time. If it boots up without a
problem, it remains super stable. I would run benchmarks for CPU, 3D,
and ethernet for days without that glitch ever appearing. It seems to
be determined at startup.
View attachment "1-bad-page-state-good-eth0.txt" of type "text/plain" (30945 bytes)
Download attachment "Screenshot from 2022-07-13 01-10-58.png" of type "image/png" (350774 bytes)
View attachment "2-bad-eth0.txt" of type "text/plain" (41174 bytes)
View attachment "3-bad-eth0.txt" of type "text/plain" (50817 bytes)
View attachment "4-bad-page-state-bad-eth0.txt" of type "text/plain" (40339 bytes)
Powered by blists - more mailing lists