linux-kernel - Re: imx6q random crashing using 4 cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOf5uwkG5oj-GXbwbe1MK2x9UnURtQHN6UrgQWTTiUxcA7h9WA@mail.gmail.com>
Date: Wed, 14 Aug 2024 18:19:17 +0200
From: Michael Nazzareno Trimarchi <michael@...rulasolutions.com>
To: Peng Fan <peng.fan@....com>
Cc: LKML <linux-kernel@...r.kernel.org>, dl-linux-imx <linux-imx@....com>, 
	Fabio Estevam <festevam@...il.com>, Shawn Guo <shawnguo@...nel.org>
Subject: Re: imx6q random crashing using 4 cpus

Hi Peng

I have a follow up

On Mon, Aug 12, 2024 at 10:45 AM Michael Nazzareno Trimarchi
<michael@...rulasolutions.com> wrote:
>
> Hi Peng
>
> On Mon, Aug 12, 2024 at 10:33 AM Peng Fan <peng.fan@....com> wrote:
> >
> > Hi,
> > > Subject: imx6q random crashing using 4 cpus
> > >
> > > Hi all
> > >
> > > I'm getting random crashes including segmentation fault of service if I
> > > boot a custom imx6q design with all the cpus (nr_cpus=3 works). I did
> > > not find anyone that were raising this problem in the past but I would
> > > like to know if you get this in your experience. The revision silicon is
> > > 1.6 for imx6q
> > >
> > > I have tested
> > >
> > > 6.10.3
> >
> > Upstream kernel?
> >
>
> This is upstream kernel
>

I have increased the internal LDO of the imx6q. Seems that bypass mode
is not possible to activate
in mainline and more seems that reduce the lifetime of the device
according to some application note.
Anyway I move the voltage to bigger values and now core seems more
stable. So those are the minor issues on mainline

1) if we start with performance governor, the voltage are change are
not applied to the core if the booting frequency is the same
of the performance once. This means that if the bootloader set a
voltage this can not be fixed by the kernel
2) bypass-mode of the regulator does not activate the anatop bypass mode

Michael



> > > 6.6
> >
>
> 6.6-fslc but I have tested on 6.6 lts too, same instability
>
> > This is upstream kernel or NXP released 6.6 kernel?
> >
> > Does older version kernel works well?
> >
>
> What revision do you suggest? I can test easily them all
>
> > >
> > > I have tested to remove idle state, increase the voltage core etc.
> >
> > cpuidle.off=1 does not help, right?
> >
>
> I have got rid of cpuidle init in mach-imx6q end tested cpuidle.off=1 too.
>
> > I could not recall clear about LDO, I remember there is LDO enabled
> > and LDO disabled. Have you checked LDO?
>
> I can try to not use LDO from pmic and use the internal one
>
> >
> > > Those cpus are industrial
> > > grade and they can run up to 800Mhz
> > >
> > > All kernels look ok if I reduce the number of cpus. Some of the
> > > backtrace for instance
> > >
> > > [  OK  ] Stopped target Preparation for Network.
> > > [  134.671302] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > > [  134.677247] rcu:     2-...0: (1 GPs behind) idle=3c74/1/0x40000000
> > > softirq=1197/1201 fqs=421
> >
> > CPU 2 seems stuck.
>
> I have seen but I don't have stuck with 3 cpus. I have seen the power supply is
> 0-1 group and 2-3 group. Is it possible that it's something connected
> to power supply
> or anything that makes the core unstable?
>
> >
> > > [  134.685445] rcu:     (detected by 0, t=2106 jiffies, g=1449, q=175
> > > ncpus=4)
> > > [  134.692158] Sending NMI from CPU 0 to CPUs 2:
> > > [  144.696530] rcu: rcu_sched kthread starved for 995 jiffies! g1449
> > > f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > [  144.706543] rcu:     Unless rcu_sched kthread gets sufficient CPU
> > > time, OOM is now expected behavior.
> > > [  144.715506] rcu: RCU grace-period kthread stack dump:
> > > [  144.720563] task:rcu_sched       state:I stack:0     pid:14
> > > tgid:14    ppid:2      flags:0x00000000
> > > [  144.729890] Call trace:
> > > [  144.729902]  __schedule from schedule+0x24/0x90 [  144.737008]
> > > schedule from schedule_timeout+0x88/0x100 [  144.742175]
> > > schedule_timeout from rcu_gp_fqs_loop+0xec/0x4c4 [  144.747955]
> > > rcu_gp_fqs_loop from rcu_gp_kthread+0xc4/0x154 [  144.753556]
> > > rcu_gp_kthread from kthread+0xdc/0xfc [  144.758381]  kthread from
> > > ret_from_fork+0x14/0x20 [  144.763108] Exception stack(0xf0875fb0
> > > to 0xf0875ff8)
> > > [  144.768172] 5fa0:                                     00000000
> > > 00000000 00000000 00000000
> > > [  144.776360] 5fc0: 00000000 00000000 00000000 00000000
> > > 00000000
> > > 00000000 00000000 00000000
> > > [  144.784546] 5fe0: 00000000 00000000 00000000 00000000
> > > 00000013 00000000 [  144.791169] rcu: Stack dump where RCU GP
> > > kthread last ran:
> > > [  144.796659] Sending NMI from CPU 0 to CPUs 1:
> > > [  144.801027] NMI backtrace for cpu 1 skipped: idling at
> > > default_idle_call+0x28/0x3c [  144.809643] sysrq: This sysrq operation
> > > is disabled.
> >
> > Have you ever tried use jtag to see cpu status?
> > cpu in idle loop?
> > cpu runs in invalid address and hang?
>
> Need to check
>
> Michael
>
> >
> > Regards,
> > Peng.
> >
> > >
> > > What I'm trying to figure out what could be the problem but I don't
> > > have similar reference
> > >
> > > Michael
> > >
> > > --
> > > Michael Nazzareno Trimarchi
> > > Co-Founder & Chief Executive Officer
> > > M. +39 347 913 2170
> > > michael@...rulasolutions.com
> > > __________________________________
> > >
> > > Amarula Solutions BV
> > > Joop Geesinkweg 125, 1114 AB, Amsterdam, NL T. +31 (0)85 111 9172
> > > info@...rulasolutions.com
> > > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F
> > > www.amarulasolutions.com%2F&data=05%7C02%7Cpeng.fan%40nxp.
> > > com%7C0cfef2a8598047ed1e1808dcbaa62d0d%7C686ea1d3bc2b4c6f
> > > a92cd99c5c301635%7C0%7C0%7C638590470075161250%7CUnknow
> > > n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI
> > > 6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=9wzW6km41s
> > > pIH2J4DjAVZFtW%2FGjIDWeEB%2FJkL74477o%3D&reserved=0
>
>
>
> --
> Michael Nazzareno Trimarchi
> Co-Founder & Chief Executive Officer
> M. +39 347 913 2170
> michael@...rulasolutions.com
> __________________________________
>
> Amarula Solutions BV
> Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
> T. +31 (0)85 111 9172
> info@...rulasolutions.com
> www.amarulasolutions.com



-- 
Michael Nazzareno Trimarchi
Co-Founder & Chief Executive Officer
M. +39 347 913 2170
michael@...rulasolutions.com
__________________________________

Amarula Solutions BV
Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
T. +31 (0)85 111 9172
info@...rulasolutions.com
www.amarulasolutions.com