linux-kernel - imx6q random crashing using 4 cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOf5uw=r_eZs6d93bqposDfgcBvax+ZUC865g-H2BwC5g3Hdxw@mail.gmail.com>
Date: Mon, 12 Aug 2024 10:09:54 +0200
From: Michael Nazzareno Trimarchi <michael@...rulasolutions.com>
To: LKML <linux-kernel@...r.kernel.org>, NXP Linux Team <Linux-imx@....com>, 
	Fabio Estevam <festevam@...il.com>, Peng Fan <peng.fan@....com>, Shawn Guo <shawnguo@...nel.org>
Subject: imx6q random crashing using 4 cpus

Hi all

I'm getting random crashes including segmentation fault of service if
I boot a custom imx6q design with all the cpus (nr_cpus=3 works). I
did not find anyone that were raising this problem in the past but I
would like to know if you get this in your experience. The revision
silicon is 1.6 for imx6q

I have tested

6.10.3
6.6

I have tested to remove idle state, increase the voltage core etc.
Those cpus are industrial
grade and they can run up to 800Mhz

All kernels look ok if I reduce the number of cpus. Some of the
backtrace for instance

[  OK  ] Stopped target Preparation for Network.
[  134.671302] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  134.677247] rcu:     2-...0: (1 GPs behind) idle=3c74/1/0x40000000
softirq=1197/1201 fqs=421
[  134.685445] rcu:     (detected by 0, t=2106 jiffies, g=1449, q=175 ncpus=4)
[  134.692158] Sending NMI from CPU 0 to CPUs 2:
[  144.696530] rcu: rcu_sched kthread starved for 995 jiffies! g1449
f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
[  144.706543] rcu:     Unless rcu_sched kthread gets sufficient CPU
time, OOM is now expected behavior.
[  144.715506] rcu: RCU grace-period kthread stack dump:
[  144.720563] task:rcu_sched       state:I stack:0     pid:14
tgid:14    ppid:2      flags:0x00000000
[  144.729890] Call trace:
[  144.729902]  __schedule from schedule+0x24/0x90
[  144.737008]  schedule from schedule_timeout+0x88/0x100
[  144.742175]  schedule_timeout from rcu_gp_fqs_loop+0xec/0x4c4
[  144.747955]  rcu_gp_fqs_loop from rcu_gp_kthread+0xc4/0x154
[  144.753556]  rcu_gp_kthread from kthread+0xdc/0xfc
[  144.758381]  kthread from ret_from_fork+0x14/0x20
[  144.763108] Exception stack(0xf0875fb0 to 0xf0875ff8)
[  144.768172] 5fa0:                                     00000000
00000000 00000000 00000000
[  144.776360] 5fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[  144.784546] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  144.791169] rcu: Stack dump where RCU GP kthread last ran:
[  144.796659] Sending NMI from CPU 0 to CPUs 1:
[  144.801027] NMI backtrace for cpu 1 skipped: idling at
default_idle_call+0x28/0x3c
[  144.809643] sysrq: This sysrq operation is disabled.

What I'm trying to figure out what could be the problem but I don't
have similar reference

Michael

--
Michael Nazzareno Trimarchi
Co-Founder & Chief Executive Officer
M. +39 347 913 2170
michael@...rulasolutions.com
__________________________________

Amarula Solutions BV
Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
T. +31 (0)85 111 9172
info@...rulasolutions.com
www.amarulasolutions.com