[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e778829-d2a9-3606-3769-e50ab23836dc@nbd.name>
Date: Thu, 30 Mar 2023 19:06:13 +0200
From: Felix Fietkau <nbd@....name>
To: Frank Wunderlich <frank-w@...lic-files.de>
Cc: netdev@...r.kernel.org, Daniel Golle <daniel@...rotopia.org>
Subject: Re: Aw: Re: Re: Re: Re: [PATCH net] net: ethernet: mtk_eth_soc: fix
tx throughput regression with direct 1G links
On 30.03.23 15:58, Frank Wunderlich wrote:
> something ist still strange...i get a rcu stall again with this patch...reverted it and my r2 boots again.
>
> [ 29.772755] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 29.778689] rcu: 2-...0: (1 GPs behind) idle=547c/1/0x40000000 softirq=251/258 fqs=427
> [ 29.786697] rcu: (detected by 1, t=2104 jiffies, g=-875, q=29 ncpus=4)
> [ 29.793308] Sending NMI from CPU 1 to CPUs 2:
> [ 34.492968] vusb: disabling
> [ 34.495828] vmc: disabling
> [ 34.498587] vmch: disabling
> [ 34.501433] vgp1: disabling
> [ 34.504426] vcamaf: disabling
> [ 39.797579] rcu: rcu_sched kthread timer wakeup didn't happen for 994 jiffies! g-875 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> [ 39.808619] rcu: Possible timer handling issue on cpu=1 timer-softirq=493
> [ 39.815487] rcu: rcu_sched kthread starved for 995 jiffies! g-875 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
> [ 39.825571] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 39.834520] rcu: RCU grace-period kthread stack dump:
> [ 39.839564] task:rcu_sched state:I stack:0 pid:14 ppid:2 flags:0x00000000
> [ 39.847928] __schedule from schedule+0x54/0xe8
> [ 39.852472] schedule from schedule_timeout+0x94/0x158
> [ 39.857619] schedule_timeout from rcu_gp_fqs_loop+0x12c/0x50c
> [ 39.863467] rcu_gp_fqs_loop from rcu_gp_kthread+0x194/0x21c
> [ 39.869135] rcu_gp_kthread from kthread+0xc8/0xcc
> [ 39.873931] kthread from ret_from_fork+0x14/0x2c
> [ 39.878639] Exception stack(0xf0859fb0 to 0xf0859ff8)
> [ 39.883690] 9fa0: 00000000 00000000 00000000 00000000
> [ 39.891864] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 39.900037] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 39.906645] rcu: Stack dump where RCU GP kthread last ran:
> [ 39.912125] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.3.0-rc1-bpi-r2-rc-net #2
> [ 39.919518] Hardware name: Mediatek Cortex-A7 (Device Tree)
> [ 39.925082] PC is at default_idle_call+0x1c/0xb0
> [ 39.929698] LR is at ct_kernel_enter.constprop.0+0x48/0x11c
> [ 39.935267] pc : [<c0d105ec>] lr : [<c0d0ffa4>] psr: 600e0013
> [ 39.941527] sp : f0861fb0 ip : c15721e0 fp : 00000000
> [ 39.946746] r10: 00000000 r9 : 410fc073 r8 : 8000406a
> [ 39.951964] r7 : c1404f74 r6 : c19e0900 r5 : c15727e0 r4 : c19e0900
> [ 39.958486] r3 : 00000000 r2 : 2da0a000 r1 : 00000001 r0 : 00008cfc
> [ 39.965007] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
> [ 39.972138] Control: 10c5387d Table: 84f4806a DAC: 00000051
> [ 39.977878] default_idle_call from cpuidle_idle_call+0x24/0x68
> [ 39.983805] cpuidle_idle_call from do_idle+0x9c/0xd0
> [ 39.988863] do_idle from cpu_startup_entry+0x20/0x24
> [ 39.993921] cpu_startup_entry from secondary_start_kernel+0x118/0x138
> [ 40.000457] secondary_start_kernel from 0x801017a0
>
> maybe i need additional patch or did anything else wrong?
>
> still working on 6.3-rc1
> https://github.com/frank-w/BPI-Router-Linux/commits/6.3-rc-net
Can you try applying this patch to a stable kernel instead? These hangs
don't make any sense to me, especially the one triggered by an earlier
patch that should definitely have been a no-op because of the wrong
config symbol.
It really looks to me like you have an issue in that kernel triggered by
spurious code changes.
- Felix
Powered by blists - more mailing lists