[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240126085122.21e0a8a2@meshulam.tesarici.cz>
Date: Fri, 26 Jan 2024 08:51:22 +0100
From: Petr Tesařík <petr@...arici.cz>
To: Florian Fainelli <f.fainelli@...il.com>
Cc: Andrew Lunn <andrew@...n.ch>, Marc Haber <mh+netdev@...schlus.de>,
alexandre.torgue@...s.st.com, Jose Abreu <joabreu@...opsys.com>, Chen-Yu
Tsai <wens@...e.org>, Jernej Skrabec <jernej.skrabec@...il.com>, Samuel
Holland <samuel@...lland.org>, Jisheng Zhang <jszhang@...nel.org>,
netdev@...r.kernel.org
Subject: Re: stmmac on Banana PI CPU stalls since Linux 6.6
On Thu, 25 Jan 2024 12:00:46 -0800
Florian Fainelli <f.fainelli@...il.com> wrote:
> On 1/25/24 11:54, Andrew Lunn wrote:
> >> I have checked out 2eb85b750512cc5dc5a93d5ff00e1f83b99651db (which is
> >> the first bad commit that the bisect eventually identified) and tried
> >> running:
> >>
> >> [56/4504]mh@fan:~/linux/git/linux ((2eb85b750512...)) $ make BUILDARCH="amd64" ARCH="arm" KBUILD_DEBARCH="armhf" CROSS_COMPILE="arm-linux-gnueabihf-" drivers/net/ethernet/stmicro/stmmac/stmmac_main.lst
> >> SYNC include/config/auto.conf.cmd
> >> SYSHDR arch/arm/include/generated/uapi/asm/unistd-oabi.h
> >> SYSHDR arch/arm/include/generated/uapi/asm/unistd-eabi.h
> >> HOSTCC scripts/kallsyms
> >> UPD include/config/kernel.release
> >> UPD include/generated/uapi/linux/version.h
> >> UPD include/generated/utsrelease.h
> >> SYSNR arch/arm/include/generated/asm/unistd-nr.h
> >> SYSTBL arch/arm/include/generated/calls-oabi.S
> >> SYSTBL arch/arm/include/generated/calls-eabi.S
> >> CC scripts/mod/empty.o
> >> MKELF scripts/mod/elfconfig.h
> >> HOSTCC scripts/mod/modpost.o
> >> CC scripts/mod/devicetable-offsets.s
> >> UPD scripts/mod/devicetable-offsets.h
> >> HOSTCC scripts/mod/file2alias.o
> >> HOSTCC scripts/mod/sumversion.o
> >> HOSTLD scripts/mod/modpost
> >> CC kernel/bounds.s
> >> CC arch/arm/kernel/asm-offsets.s
> >> UPD include/generated/asm-offsets.h
> >> CALL scripts/checksyscalls.sh
> >> CHKSHA1 include/linux/atomic/atomic-arch-fallback.h
> >> CHKSHA1 include/linux/atomic/atomic-instrumented.h
> >> MKLST drivers/net/ethernet/stmicro/stmmac/stmmac_main.lst
> >> ./scripts/makelst: 1: arithmetic expression: expecting EOF: "0x - 0x00000000"
> >> [57/4505]mh@fan:~/linux/git/linux ((2eb85b750512...)) $
> >>
> >> That is not what it was suppsoed to yield, right?
> >
> > No. But did it actually generate
> > drivers/net/ethernet/stmicro/stmmac/stmmac_main.lst Sometime errors
> > like this are not always fatal.
> >
> >> My bisect eventually completed and identified
> >> 2eb85b750512cc5dc5a93d5ff00e1f83b99651db as the first bad commit.
> >
> > I can make a guess.
> >
> > - memset(&priv->xstats, 0, sizeof(struct stmmac_extra_stats));
> >
> > Its removed, not moved later. Deep within this structure is the
> > stmmac_txq_stats and stmmac_rxq_stats which this function is supposed
> > to return, and the two syncp variables are in it as well.
> >
> > My guess is, they have an invalid state, when this memset is missing.
> >
> > Try putting the memset back.
> >
> > I also guess that is not the real fix, there are missing calls to
> > u64_stats_init().
>
> Did not Petr try to address the same problem essentially:
>
> https://lore.kernel.org/netdev/20240105091556.15516-1-petr@tesarici.cz/
>
> this was not deemed the proper solution and I don't think one has been
> posted since then, but it looks about your issue here Marc.
Yes, it looks like the same issue I ran into on my NanoPi. I'm sorry
I've been busy with other things lately, so I could not test and submit
my changes.
Essentially, the write side of the statistics seqlock is not protected
and will eventually miss an increment, causing the read side to spin
forever. The final plan is to split the statistics into three parts:
1. fields updated only under the tx queue lock,
2. fields updated only during NAPI poll,
3. fields updated only from interrupt context,
The first two groups can each have its own seqlock. The third group
(actually a single counter) can be converted to a per-CPU variable. The
read side will then aggregate the values as appropriate.
I hope I can find some time for this bug again during the coming weekend
(it's not for my day job). It's motivating to know that I'm not the
only affected person on the planet. ;-)
Petr T
Powered by blists - more mailing lists