[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef696db8-20ca-4710-b642-8d22263d4ffe@riscstar.com>
Date: Sun, 8 Jun 2025 13:31:42 -0500
From: Alex Elder <elder@...cstar.com>
To: Haylen Chu <heylenay@....org>, Yixun Lan <dlan@...too.org>
Cc: mturquette@...libre.com, sboyd@...nel.org, inochiama@...look.com,
linux-clk@...r.kernel.org, spacemit@...ts.linux.dev,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
Guodong Xu <guodong@...cstar.com>
Subject: Re: [PATCH] clk: spacemit: mark K1 pll1_d8 as critical
On 6/7/25 11:46 PM, Haylen Chu wrote:
> On Sat, Jun 07, 2025 at 09:46:03PM -0500, Alex Elder wrote:
I respond below. I'm leaving all the context for now.
>> On 6/7/25 7:24 PM, Yixun Lan wrote:
>>> Hi Alex,
>>>
>>> On 15:27 Sat 07 Jun , Alex Elder wrote:
>>>> The pll1_d8 clock is enabled by the boot loader, and is ultimately a
>>>> parent for numerous clocks. Guodong Xu was recently testing DMA,
>>> ~~~~~~~~~this is still vague, numerous isn't equal to critical
>>
>> I will give you a full explanation of what I observed, below.
>>
>>>> adding a reset property, and discovered that the needed reset was
>>>> not yet ready during initial probe. It dropped its clock reference,
>>>> which dropped parent references, and along the way it dropped the sole
>>>> reference to pll1_d8 (from its prior clk_get()). Clock pll1_d8 got
>>>> disabled, which resulted in a non-functioning system.
>>>>
>>> So, I'm trying to understand the problem, and would like to evaluate if
>>> the "critical" flag is necessary..
>>>
>>> It occurs to me, the DMA driver should request and enable clock first,
>>> then request and issue a reset, it probably could be solved by proper
>>
>> No, that is not the issue. The reset is never deasserted.
>>
>>> order? so what's the real problem here? is DMA or reset? dropped the
>>> clock? or does driver fail to request a reset before clock is ready?
>>
>> The problem is with the pll1_d8 clock. That clock is enabled
>> successfully. However the reset isn't ready, so the clock
>> gets disabled, and its parent (and other ancestors) also get
>> disabled while recovering from that.
>>
>> I'll give you a high-level summary, then will lay out a ton of
>> detail.
>>
>> In the DMA driver probe function, several initial things happen
>> and then, a clock is requested (enabled):
>> <&syscon_apmu CLK_DMA>
>> That succeeds.
>>
>> Next, a reset is requested:
>> <&syscon_apmu RESET_DMA>
>> But that fails, because the reset driver probe function hasn't
>> been called yet. The request gets -EPROBE_DEFER as its result,
>> and the DMA driver starts unwinding everything so that it can
>> be probed again later. Dropping the clock reference results
>> in parent clocks dropping references. And because pll1_div8
>> initially had a reference count of 0 (despite being on),
>> dropping this single reference means it gets disabled. Then
>> we're stuck.
>>
>>
>> Here is how the DMA clock is supplied (at boot time):
>>
>> pll1 -> pll1_d8 -> pll1_d8_307p2 -> pmua_aclk -> dma_clk
>>
>> pll1 and pll1_d8 are enabled by the boot loader, but at this
>> time the drivers for various hardware that require them aren't
>> "getting" and enabling them (yet).
>>
>> devm_clk_get_optional_enabled() causes clk_prepare_enable()
>> to be called on the target clock (pll1_d8). That simply calls
>> clk_prepare() and clk_enable(). Let's focus on the latter.
>> clk_enable(dma_clk)
>> clk_core_enable_lock()
>>
>> So now the clock enable lock is held. The target clock's
>> enable_count for pll1_d8 is 0.
>>
>> clk_core_enable(dma_clk)
>> clk_core_enable(parent = pmua_aclk)
>> ...
>> enable_count++ (on dma_clk)
>>
>> The parent gets enabled (I'm fairly certain pmua_clk's
>> enable_count is also 0).
>>
>> clk_core_enable(pmua_aclk)
>> clk_core_enable(parent = pll1_d8_307p2)
>> ...
>> enable_count++ (on pmua_clk)
>>
>> And so on. When the clk_enable(dma_clk) completes, we have
>> these enable counts:
>> dma_clk: 1
>> pmua_clk: 1
>> pll1_d8_307p2: 1
>> pll1_d8: 1
>> pll1: 1? (I don't recall)
>>
>> The -EPROBE_DEFER causes the devm_clk_get_optional_enabled()
>> for dma_clk to get undone. That means clk_disable_unprepare()
>> gets called on dma_clk. Let's just focus on clk_disable().
>>
>> clk_disable(dma_clk)
>> clk_core_disable_lock(dma_clk)
>> (takes clk_enable lock)
>> clk_core_disable()
>> --enable_count becomes 0 (on dma_clk)
>> (disables dma_clk)
>> clk_core_disable(core->parent = pmua_aclk)
>>
>> clk_core_disable(pmua_aclk)
>> --enable_count becomes 0 (on pmua_clk)
>> (disables pmua_clk)
>> clk_core_dissable(core->parent = pll1_d8_307p2)
>>
>> clk_core_disable(pll1_d8_307p2)
>> --enable_count becomes 0 (on pll1_d8_307p2)
>> (disables pll1_d8_307p2)
>> clk_core_dissable(core->parent = pll1_d8)
>>
>> clk_core_disable(pll1_d8\)
>> --enable_count becomes 0 (on pll1)
>> (disables pll1_d8)
>> BOOM
>
> Yeah, I got the reason that pll1_d8 is disabled, but I don't still
> understand why it matters: pll1_d8 is a simple factor gate, though being
> parent of various peripheral clocks, it's okay to enable it again later
> if some peripherals decide to use it or one of its child, isn't it?
When it gets disabled, the system becomes non-functional. What that
means is that everything stops, no more output, and even when I
enabled KDB it did not drop into KDB. *Something* depends on it,
but there is no driver that properly enables the clock (yet).
This means--for now--it seems to be a critical clock.
> I could come up with several scenarios where disabling the clock really
> causes problems,
>
> 1. Some essential SoC components are actually supplied by pll1_d8 or one
> of its children, but isn't described in devicetree or the driver,
> thus disabling pll1_d8 leads to an SoC hang. We should mark the
This is what I believe is happening. So my fix marks the one clock
as critical in the driver. I would not want to mark it critical
in DT.
> precise clock that being essential as critcal, instead of setting
> pll1_d8 as critical to work around the problem.
I provided the chain of clocks leading to the dma_clk. Disabling
the dma_clk did not cause harm; disabling pmua_aclk did not cause
harm; disabling pll1_d8_307p2 did not cause harm. Only when pll1_d8
was disabled did the machine become unusable.
> 2. There're bugs in our clock driver, thus it fails to bring pll1_d8
> (or maybe also its children) back to a sound state. If so we should
> fix the driver.
No, I found that pll1_d8 had an enable count of 1 after dma_clk
was enabled. And then the set of disables that resulted from the
-EPROBE_DEFER discovered its enable count was zero, and therefore
disabled pll1_d8 (after dma_clk, pmua_aclk, and pll1_d8_307p2 were
disabled). I do not suspect the clock *driver* is at fault.
> Unless you could confirm pll1_d8 (not its child) really supplies some
> essential SoC components, I don't think simply marking pll1_d8 as
> critical is the correct solution.
I did confirm this. The child gets disabled before the parent in
clk_core_disable().
> And, I don't even know what "non-functioning system" behaves like. Could
> you please provide more information, like soC behaviours or dmesg when
> disabling pll1_d8 causes problems? Thanks.
Maybe you could, as an experiment, pretend <&syscon_apmu CLK_DMA>
is a clock for something and get it during probe, using
devm_clk_get_optional_enabled(). Then just return -EPROBE_DEFER
after that, and I think you'll reproduce the issue. In fact,
you might hit the issue more quickly if you used CLK_PLL1_D8.
-Alex
>> I hope this is clear.
>>
>> -Alex
>
> Regards,
> Haylen Chu
>
>>
>>>> Mark that clock critical so it doesn't get turned off in this case.
>>>> We might be able to turn this flag off someday, but for now it
>>>> resolves the problem Guodong encountered.
>>>>
>>>> Define a new macro CCU_FACTOR_GATE_DEFINE() to allow clock flags to
>>>> be supplied for a CCU_FACTOR_GATE clock.
>>>>
>>>> Fixes: 1b72c59db0add ("clk: spacemit: Add clock support for SpacemiT K1 SoC")
>>>> Signed-off-by: Alex Elder <elder@...cstar.com>
>>>> Tested-by: Guodong Xu <guodong@...cstar.com>
>>>> ---
>>>> drivers/clk/spacemit/ccu-k1.c | 3 ++-
>>>> drivers/clk/spacemit/ccu_mix.h | 21 +++++++++++++--------
>>>> 2 files changed, 15 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/clk/spacemit/ccu-k1.c b/drivers/clk/spacemit/ccu-k1.c
>>>> index cdde37a052353..df65009a07bb1 100644
>>>> --- a/drivers/clk/spacemit/ccu-k1.c
>>>> +++ b/drivers/clk/spacemit/ccu-k1.c
>>>> @@ -170,7 +170,8 @@ CCU_FACTOR_GATE_DEFINE(pll1_d4, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(3), 4,
>>>> CCU_FACTOR_GATE_DEFINE(pll1_d5, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(4), 5, 1);
>>>> CCU_FACTOR_GATE_DEFINE(pll1_d6, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(5), 6, 1);
>>>> CCU_FACTOR_GATE_DEFINE(pll1_d7, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(6), 7, 1);
>>>> -CCU_FACTOR_GATE_DEFINE(pll1_d8, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(7), 8, 1);
>>>> +CCU_FACTOR_GATE_FLAGS_DEFINE(pll1_d8, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(7), 8, 1,
>>>> + CLK_IS_CRITICAL);
>>>> CCU_FACTOR_GATE_DEFINE(pll1_d11_223p4, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(15), 11, 1);
>>>> CCU_FACTOR_GATE_DEFINE(pll1_d13_189, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(16), 13, 1);
>>>> CCU_FACTOR_GATE_DEFINE(pll1_d23_106p8, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(20), 23, 1);
>>>> diff --git a/drivers/clk/spacemit/ccu_mix.h b/drivers/clk/spacemit/ccu_mix.h
>>>> index 51d19f5d6aacb..668c8139339e1 100644
>>>> --- a/drivers/clk/spacemit/ccu_mix.h
>>>> +++ b/drivers/clk/spacemit/ccu_mix.h
>>>> @@ -101,16 +101,21 @@ static struct ccu_mix _name = { \
>>>> } \
>>>> }
>>>> +#define CCU_FACTOR_GATE_FLAGS_DEFINE(_name, _parent, _reg_ctrl, _mask_gate, _div, \
>>>> + _mul, _flags) \
>>>> +struct ccu_mix _name = { \
>>>> + .gate = CCU_GATE_INIT(_mask_gate), \
>>>> + .factor = CCU_FACTOR_INIT(_div, _mul), \
>>>> + .common = { \
>>>> + .reg_ctrl = _reg_ctrl, \
>>>> + CCU_MIX_INITHW(_name, _parent, spacemit_ccu_factor_gate_ops, _flags) \
>>>> + } \
>>>> +}
>>>> +
>>>> #define CCU_FACTOR_GATE_DEFINE(_name, _parent, _reg_ctrl, _mask_gate, _div, \
>>>> _mul) \
>>>> -static struct ccu_mix _name = { \
>>>> - .gate = CCU_GATE_INIT(_mask_gate), \
>>>> - .factor = CCU_FACTOR_INIT(_div, _mul), \
>>>> - .common = { \
>>>> - .reg_ctrl = _reg_ctrl, \
>>>> - CCU_MIX_INITHW(_name, _parent, spacemit_ccu_factor_gate_ops, 0) \
>>>> - } \
>>>> -}
>>>> + CCU_FACTOR_GATE_FLAGS_DEFINE(_name, _parent, _reg_ctrl, _mask_gate, _div, \
>>>> + _mul, 0)
>>>> #define CCU_MUX_GATE_DEFINE(_name, _parents, _reg_ctrl, _shift, _width, \
>>>> _mask_gate, _flags) \
>>>> --
>>>> 2.45.2
>>>>
>>>
>>
Powered by blists - more mailing lists