[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<PAXPR04MB84598DA8723E1F167435F81D88852@PAXPR04MB8459.eurprd04.prod.outlook.com>
Date: Mon, 12 Aug 2024 08:33:14 +0000
From: Peng Fan <peng.fan@....com>
To: Michael Nazzareno Trimarchi <michael@...rulasolutions.com>, LKML
<linux-kernel@...r.kernel.org>, dl-linux-imx <linux-imx@....com>, Fabio
Estevam <festevam@...il.com>, Shawn Guo <shawnguo@...nel.org>
Subject: RE: imx6q random crashing using 4 cpus
Hi,
> Subject: imx6q random crashing using 4 cpus
>
> Hi all
>
> I'm getting random crashes including segmentation fault of service if I
> boot a custom imx6q design with all the cpus (nr_cpus=3 works). I did
> not find anyone that were raising this problem in the past but I would
> like to know if you get this in your experience. The revision silicon is
> 1.6 for imx6q
>
> I have tested
>
> 6.10.3
Upstream kernel?
> 6.6
This is upstream kernel or NXP released 6.6 kernel?
Does older version kernel works well?
>
> I have tested to remove idle state, increase the voltage core etc.
cpuidle.off=1 does not help, right?
I could not recall clear about LDO, I remember there is LDO enabled
and LDO disabled. Have you checked LDO?
> Those cpus are industrial
> grade and they can run up to 800Mhz
>
> All kernels look ok if I reduce the number of cpus. Some of the
> backtrace for instance
>
> [ OK ] Stopped target Preparation for Network.
> [ 134.671302] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 134.677247] rcu: 2-...0: (1 GPs behind) idle=3c74/1/0x40000000
> softirq=1197/1201 fqs=421
CPU 2 seems stuck.
> [ 134.685445] rcu: (detected by 0, t=2106 jiffies, g=1449, q=175
> ncpus=4)
> [ 134.692158] Sending NMI from CPU 0 to CPUs 2:
> [ 144.696530] rcu: rcu_sched kthread starved for 995 jiffies! g1449
> f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> [ 144.706543] rcu: Unless rcu_sched kthread gets sufficient CPU
> time, OOM is now expected behavior.
> [ 144.715506] rcu: RCU grace-period kthread stack dump:
> [ 144.720563] task:rcu_sched state:I stack:0 pid:14
> tgid:14 ppid:2 flags:0x00000000
> [ 144.729890] Call trace:
> [ 144.729902] __schedule from schedule+0x24/0x90 [ 144.737008]
> schedule from schedule_timeout+0x88/0x100 [ 144.742175]
> schedule_timeout from rcu_gp_fqs_loop+0xec/0x4c4 [ 144.747955]
> rcu_gp_fqs_loop from rcu_gp_kthread+0xc4/0x154 [ 144.753556]
> rcu_gp_kthread from kthread+0xdc/0xfc [ 144.758381] kthread from
> ret_from_fork+0x14/0x20 [ 144.763108] Exception stack(0xf0875fb0
> to 0xf0875ff8)
> [ 144.768172] 5fa0: 00000000
> 00000000 00000000 00000000
> [ 144.776360] 5fc0: 00000000 00000000 00000000 00000000
> 00000000
> 00000000 00000000 00000000
> [ 144.784546] 5fe0: 00000000 00000000 00000000 00000000
> 00000013 00000000 [ 144.791169] rcu: Stack dump where RCU GP
> kthread last ran:
> [ 144.796659] Sending NMI from CPU 0 to CPUs 1:
> [ 144.801027] NMI backtrace for cpu 1 skipped: idling at
> default_idle_call+0x28/0x3c [ 144.809643] sysrq: This sysrq operation
> is disabled.
Have you ever tried use jtag to see cpu status?
cpu in idle loop?
cpu runs in invalid address and hang?
Regards,
Peng.
>
> What I'm trying to figure out what could be the problem but I don't
> have similar reference
>
> Michael
>
> --
> Michael Nazzareno Trimarchi
> Co-Founder & Chief Executive Officer
> M. +39 347 913 2170
> michael@...rulasolutions.com
> __________________________________
>
> Amarula Solutions BV
> Joop Geesinkweg 125, 1114 AB, Amsterdam, NL T. +31 (0)85 111 9172
> info@...rulasolutions.com
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F
> www.amarulasolutions.com%2F&data=05%7C02%7Cpeng.fan%40nxp.
> com%7C0cfef2a8598047ed1e1808dcbaa62d0d%7C686ea1d3bc2b4c6f
> a92cd99c5c301635%7C0%7C0%7C638590470075161250%7CUnknow
> n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI
> 6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=9wzW6km41s
> pIH2J4DjAVZFtW%2FGjIDWeEB%2FJkL74477o%3D&reserved=0
Powered by blists - more mailing lists