linux-kernel - cpufreq//voltage at boot (WAS Re: imx6q random crashing using 4 cpus)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOf5uwn5pTaZW7DY0DmcKrN1rJqhhDHOjdWyp8OHBET=ompUZA@mail.gmail.com>
Date: Thu, 15 Aug 2024 23:56:11 +0200
From: Michael Nazzareno Trimarchi <michael@...rulasolutions.com>
To: Peng Fan <peng.fan@....com>
Cc: LKML <linux-kernel@...r.kernel.org>, dl-linux-imx <linux-imx@....com>, 
	Fabio Estevam <festevam@...il.com>, Shawn Guo <shawnguo@...nel.org>
Subject: cpufreq//voltage at boot (WAS Re: imx6q random crashing using 4 cpus)

Hi Peng

I have more information now, apart from my board problems. Follow my comments

On Wed, Aug 14, 2024 at 6:19 PM Michael Nazzareno Trimarchi
<michael@...rulasolutions.com> wrote:
>
> Hi Peng
>
> I have a follow up
>
> On Mon, Aug 12, 2024 at 10:45 AM Michael Nazzareno Trimarchi
> <michael@...rulasolutions.com> wrote:
> >
> > Hi Peng
> >
> > On Mon, Aug 12, 2024 at 10:33 AM Peng Fan <peng.fan@....com> wrote:
> > >
> > > Hi,
> > > > Subject: imx6q random crashing using 4 cpus
> > > >
> > > > Hi all
> > > >
> > > > I'm getting random crashes including segmentation fault of service if I
> > > > boot a custom imx6q design with all the cpus (nr_cpus=3 works). I did
> > > > not find anyone that were raising this problem in the past but I would
> > > > like to know if you get this in your experience. The revision silicon is
> > > > 1.6 for imx6q
> > > >
> > > > I have tested
> > > >
> > > > 6.10.3
> > >
> > > Upstream kernel?
> > >
> >
> > This is upstream kernel
> >
>
> I have increased the internal LDO of the imx6q. Seems that bypass mode
> is not possible to activate
> in mainline and more seems that reduce the lifetime of the device
> according to some application note.
> Anyway I move the voltage to bigger values and now core seems more
> stable. So those are the minor issues on mainline
>

If we start with uboot and we have boot target frequency, uboot set
only the core and soc
voltage value (anatop), this means that the pmic stays on boot
voltage. The cpufreq driver as
far as I understand if we boot on performance it does not change the
voltage of the regulator
so does not recalculate the pmic value according to frequency. If we
have an industrial
cpu where the max freq is 800Mhz, this means that pmic starts at 1375
mv (pfuze100( and not the
reg_arm + 125 mV as it should and then kernel anyway is not able to
fix it up unless you
change governor and then go back to performance. Now I don't know if
the problem sits
on the bootloader or kernel or just in both because the kernel must
anyway to not depend
on the bootloader. uboot implement anatop regulator but at the end the
pmic is not calcualte
according to booting frequency.

Michael

> 1) if we start with performance governor, the voltage are change are
> not applied to the core if the booting frequency is the same
> of the performance once. This means that if the bootloader set a
> voltage this can not be fixed by the kernel
> 2) bypass-mode of the regulator does not activate the anatop bypass mode
>
> Michael
>
>
>
> > > > 6.6
> > >
> >
> > 6.6-fslc but I have tested on 6.6 lts too, same instability
> >
> > > This is upstream kernel or NXP released 6.6 kernel?
> > >
> > > Does older version kernel works well?
> > >
> >
> > What revision do you suggest? I can test easily them all
> >
> > > >
> > > > I have tested to remove idle state, increase the voltage core etc.
> > >
> > > cpuidle.off=1 does not help, right?
> > >
> >
> > I have got rid of cpuidle init in mach-imx6q end tested cpuidle.off=1 too.
> >
> > > I could not recall clear about LDO, I remember there is LDO enabled
> > > and LDO disabled. Have you checked LDO?
> >
> > I can try to not use LDO from pmic and use the internal one
> >
> > >
> > > > Those cpus are industrial
> > > > grade and they can run up to 800Mhz
> > > >
> > > > All kernels look ok if I reduce the number of cpus. Some of the
> > > > backtrace for instance
> > > >
> > > > [  OK  ] Stopped target Preparation for Network.
> > > > [  134.671302] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > [  134.677247] rcu:     2-...0: (1 GPs behind) idle=3c74/1/0x40000000
> > > > softirq=1197/1201 fqs=421
> > >
> > > CPU 2 seems stuck.
> >
> > I have seen but I don't have stuck with 3 cpus. I have seen the power supply is
> > 0-1 group and 2-3 group. Is it possible that it's something connected
> > to power supply
> > or anything that makes the core unstable?
> >
> > >
> > > > [  134.685445] rcu:     (detected by 0, t=2106 jiffies, g=1449, q=175
> > > > ncpus=4)
> > > > [  134.692158] Sending NMI from CPU 0 to CPUs 2:
> > > > [  144.696530] rcu: rcu_sched kthread starved for 995 jiffies! g1449
> > > > f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > [  144.706543] rcu:     Unless rcu_sched kthread gets sufficient CPU
> > > > time, OOM is now expected behavior.
> > > > [  144.715506] rcu: RCU grace-period kthread stack dump:
> > > > [  144.720563] task:rcu_sched       state:I stack:0     pid:14
> > > > tgid:14    ppid:2      flags:0x00000000
> > > > [  144.729890] Call trace:
> > > > [  144.729902]  __schedule from schedule+0x24/0x90 [  144.737008]
> > > > schedule from schedule_timeout+0x88/0x100 [  144.742175]
> > > > schedule_timeout from rcu_gp_fqs_loop+0xec/0x4c4 [  144.747955]
> > > > rcu_gp_fqs_loop from rcu_gp_kthread+0xc4/0x154 [  144.753556]
> > > > rcu_gp_kthread from kthread+0xdc/0xfc [  144.758381]  kthread from
> > > > ret_from_fork+0x14/0x20 [  144.763108] Exception stack(0xf0875fb0
> > > > to 0xf0875ff8)
> > > > [  144.768172] 5fa0:                                     00000000
> > > > 00000000 00000000 00000000
> > > > [  144.776360] 5fc0: 00000000 00000000 00000000 00000000
> > > > 00000000
> > > > 00000000 00000000 00000000
> > > > [  144.784546] 5fe0: 00000000 00000000 00000000 00000000
> > > > 00000013 00000000 [  144.791169] rcu: Stack dump where RCU GP
> > > > kthread last ran:
> > > > [  144.796659] Sending NMI from CPU 0 to CPUs 1:
> > > > [  144.801027] NMI backtrace for cpu 1 skipped: idling at
> > > > default_idle_call+0x28/0x3c [  144.809643] sysrq: This sysrq operation
> > > > is disabled.
> > >
> > > Have you ever tried use jtag to see cpu status?
> > > cpu in idle loop?
> > > cpu runs in invalid address and hang?
> >
> > Need to check
> >
> > Michael
> >
> > >
> > > Regards,
> > > Peng.
> > >
> > > >
> > > > What I'm trying to figure out what could be the problem but I don't
> > > > have similar reference
> > > >
> > > > Michael
> > > >
> > > > --
> > > > Michael Nazzareno Trimarchi
> > > > Co-Founder & Chief Executive Officer
> > > > M. +39 347 913 2170
> > > > michael@...rulasolutions.com
> > > > __________________________________
> > > >
> > > > Amarula Solutions BV
> > > > Joop Geesinkweg 125, 1114 AB, Amsterdam, NL T. +31 (0)85 111 9172
> > > > info@...rulasolutions.com
> > > > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F
> > > > www.amarulasolutions.com%2F&data=05%7C02%7Cpeng.fan%40nxp.
> > > > com%7C0cfef2a8598047ed1e1808dcbaa62d0d%7C686ea1d3bc2b4c6f
> > > > a92cd99c5c301635%7C0%7C0%7C638590470075161250%7CUnknow
> > > > n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI
> > > > 6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=9wzW6km41s
> > > > pIH2J4DjAVZFtW%2FGjIDWeEB%2FJkL74477o%3D&reserved=0
> >
> >
> >
> > --
> > Michael Nazzareno Trimarchi
> > Co-Founder & Chief Executive Officer
> > M. +39 347 913 2170
> > michael@...rulasolutions.com
> > __________________________________
> >
> > Amarula Solutions BV
> > Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
> > T. +31 (0)85 111 9172
> > info@...rulasolutions.com
> > www.amarulasolutions.com
>
>
>
> --
> Michael Nazzareno Trimarchi
> Co-Founder & Chief Executive Officer
> M. +39 347 913 2170
> michael@...rulasolutions.com
> __________________________________
>
> Amarula Solutions BV
> Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
> T. +31 (0)85 111 9172
> info@...rulasolutions.com
> www.amarulasolutions.com



-- 
Michael Nazzareno Trimarchi
Co-Founder & Chief Executive Officer
M. +39 347 913 2170
michael@...rulasolutions.com
__________________________________

Amarula Solutions BV
Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
T. +31 (0)85 111 9172
info@...rulasolutions.com
www.amarulasolutions.com