lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 19 Feb 2024 20:44:21 +0100
From: Petr Tesařík <petr@...arici.cz>
To: Christian Stewart <christian@...rture.us>
Cc: Marc Haber <mh+netdev@...schlus.de>, Florian Fainelli
 <f.fainelli@...il.com>, Andrew Lunn <andrew@...n.ch>,
 alexandre.torgue@...s.st.com, Jose Abreu <joabreu@...opsys.com>, Chen-Yu
 Tsai <wens@...e.org>, Jernej Skrabec <jernej.skrabec@...il.com>, Samuel
 Holland <samuel@...lland.org>, Jisheng Zhang <jszhang@...nel.org>,
 netdev@...r.kernel.org
Subject: Re: stmmac on Banana PI CPU stalls since Linux 6.6

On Mon, 19 Feb 2024 11:20:35 -0800
Christian Stewart <christian@...rture.us> wrote:

> Hi all,
> 
> On Mon, Feb 12, 2024 at 4:15 AM Marc Haber <mh+netdev@...schlus.de> wrote:
> >
> > On Tue, Feb 06, 2024 at 09:23:51AM +0100, Petr Tesařík wrote:  
> > > On Mon, 5 Feb 2024 13:50:35 -0800
> > > Florian Fainelli <f.fainelli@...il.com> wrote:
> > >  
> > > > On 2/5/24 12:12, Marc Haber wrote:  
> > > > > On Fri, Jan 26, 2024 at 12:10:28PM +0100, Petr Tesařík wrote:  
> > > > >> Then you may want to start by verifying that it is indeed the same
> > > > >> issue. Try the linked patch.  
> > > > >
> > > > > The linked patch seemed to help for 6.7.2, the test machine ran for five
> > > > > days without problems. After going to unpatched 6.7.2, the issue was
> > > > > back in six hours.  
> > > >
> > > > Do you mind responding to Petr's patch with a Tested-by? Thanks!  
> > >
> > > I believe Marc tested my first attempt at a solution (the one with
> > > spinlocks), not the latest incarnation. FWIW I have tested a similar
> > > scenario, with similar results.  
> >
> > Where is the latest patch? I can give it a try.
> >
> > Sorry for not responding any earlier, February 10 is an important tax
> > due date in Germany.
> >
> > Greetings
> > Marc  
> 
> We are seeing the same kernel panic on shutdown with 6.7.4 on a
> BananaPi M2 Ultra:
> 
> [**    ] (3 of 3) A stop job is running for Network Manager (33s / 52s)
> [  259.463772] rcu: INFO: rcu_sched self-detected stall on CPU
> [  259.469388] rcu:     0-....: (2099 ticks this GP)
> idle=0fdc/1/0x40000002 softirq=12003/12003 fqs=1034
> [  259.478360] rcu:     (t=2100 jiffies g=16277 q=36 ncpus=4)
> [  259.483595] CPU: 0 PID: 4462 Comm: ip Tainted: G         C         6.7.4 #1
> [  259.490562] Hardware name: Allwinner sun8i Family
> [  259.495268] PC is at stmmac_get_stats64+0x30/0x198
> [  259.500081] LR is at dev_get_stats+0x3c/0x160
> [  259.504445] pc : [<c06b9924>]    lr : [<c07bf7a8>]    psr: 200f0013
> [  259.510712] sp : f1e6d9b8  ip : c3ca478c  fp : c23e0000
> [  259.515941] r10: 00000000  r9 : c3ca4598  r8 : 00000000
> [  259.521168] r7 : 00000001  r6 : 00000000  r5 : c23e3000  r4 : 00000001
> [  259.527697] r3 : 00005c1b  r2 : c23e2e08  r1 : c3ca46c4  r0 : c23e0000
> [  259.534226] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> [  259.541363] Control: 10c5387d  Table: 429cc06a  DAC: 00000051
> [  259.547117]  stmmac_get_stats64 from dev_get_stats+0x3c/0x160
> [  259.552882]  dev_get_stats from rtnl_fill_stats+0x30/0x118
> [  259.552899]  rtnl_fill_stats from rtnl_fill_ifinfo+0x720/0x135c
> [  259.564306]  rtnl_fill_ifinfo from rtnl_dump_ifinfo+0x330/0x6a8
> [  259.570240]  rtnl_dump_ifinfo from netlink_dump+0x16c/0x350
> [  259.575830]  netlink_dump from __netlink_dump_start+0x1bc/0x280
> [  259.581766]  __netlink_dump_start from rtnetlink_rcv_msg+0xf4/0x2f0
> [  259.588047]  rtnetlink_rcv_msg from netlink_rcv_skb+0xb8/0x118
> [  259.593893]  netlink_rcv_skb from netlink_unicast+0x1fc/0x2d8
> [  259.599655]  netlink_unicast from netlink_sendmsg+0x1c8/0x440
> [  259.605416]  netlink_sendmsg from sock_write_iter+0xa0/0x10c
> [  259.611094]  sock_write_iter from vfs_write+0x338/0x398
> [  259.616334]  vfs_write from ksys_write+0xbc/0xf0
> [  259.620961]  ksys_write from ret_fast_syscall+0x0/0x54
> [  259.626110] Exception stack(0xf1e6dfa8 to 0xf1e6dff0)
> [  259.631169] dfa0:                   00000003 be997dd8 00000003
> be997dd8 00000014 00000001
> [  259.639351] dfc0: 00000003 be997dd8 00000014 00000004 00519548
> be997e08 b6fd0ce0 0051783c
> 
> https://github.com/skiffos/SkiffOS/issues/307
> 
> I'm writing to ask if anyone has found a fix for this yet?

If you're running a 6.7 stable kernel, my patch has just been added to
the 6.7-stable tree.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-6.7/net-stmmac-protect-updates-of-64-bit-statistics-counters.patch

However, lockdep has reported an issue with it:

https://lore.kernel.org/lkml/ea1567d9-ce66-45e6-8168-ac40a47d1821@roeck-us.net/

This new report has not yet been properly understood, but FWIW I've
been running stable with my patch for over a month now.

Petr T

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ