[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211014084437.47a04183@sal.lan>
Date: Thu, 14 Oct 2021 08:44:37 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Manivannan Sadhasivam <mani@...nel.org>
Cc: Michael Turquette <mturquette@...libre.com>,
Stephen Boyd <sboyd@...nel.org>, linuxarm@...wei.com,
mauro.chehab@...wei.com, linux-clk@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true
on embedded devs
Em Mon, 11 Oct 2021 11:47:18 +0530
Manivannan Sadhasivam <mani@...nel.org> escreveu:
> Hi Mauro,
>
> On Thu, Oct 07, 2021 at 02:06:53PM +0200, Mauro Carvalho Chehab wrote:
> > Currently, the only way to boot a Kernel with drivers built as modules on embedded
> > devices like HiKey 970 is to pass clk_ignore_unused=true as a modprobe parameter.
> >
> > There are two separate issues:
> >
> > 1. the clk's core calls clk_disable_unused() too early. By the time this
> > function is called, only the builtin drivers were already probed/initialized.
> > Drivers built as modules will only be probed afterwards.
> >
> > This cause a race condition and boot instability, as the clk core will try
> > to disable clocks while the drivers built as modules are still being
> > probed and initialized.
>
> So you are mentioning a "race" condition here but it is not mentioned in the
> actual patch.
Patch 1 explains it...
> If the issue you are seeing is because the clocks used by the
> modules are disabled before they are probed, why can't they just enable the
> clocks during the probe time?
>
> Am I missing something?
What happens is that such clocks are enabled when the system boots,
and, when those are disabled, very bad things happen, as those
interrupt clocks used by several parts of the system.
Most of the problems happen because the ARM SoC produce SError NMI
interrupts when some such clocks are disabled, which calls panic().
Other clocks disable some key components of the system that aren't
directly related with a driver, but, instead, controls some core
part of the device, making the SoC to wait forever for an I/O event
that will never happen.
A small set of clocks make the system unreliable, causing drivers
to fail probing. Those can either lead to panic() or break support
for a peripheral, like WiFi, USB and/or PCI.
The core issue is that clk_disable_unused() happens too early.
This is called at late_initcall_sync() time, which is triggered
before the probe/init code of the drivers compiled as modules
to be called. So, what happens is:
BIOS enables clocks that are needed for the device to boot
|
+-> Linux start booting
|
+-> builtin drivers are probed
|
+--------------------------------\
| |
+-> late_initcall_sync() calls +-> Modules start probing
| clk_disable_unused) |
| +-> Some drivers are probed
| | before their needed clks
| | got disabled
| |
+-> Clocks are disabled |
| |
+-> SError -> panic() |
\ (several drivers weren't
probed/initialized)
The only fix for that is to postpone clk_disable_unused() to happen
after all driver probe/init are called, or to completely disable
it.
The current distributions recommended at:
https://www.96boards.org/product/hikey970/
pass clk_ignore_unused as a boot parameter, which disables the call
to clk_disable_unused().
The only sane way to get rid of that is to fix the core to let the
drivers to finish probe/init before disabling clocks.
See, the regulators logic that disables unused power lines also
do the same: it waits for 30 seconds after late_initcall_sync()
before calling Runtime PM suspend logic.
Regards,
Mauro
Powered by blists - more mailing lists