lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef696db8-20ca-4710-b642-8d22263d4ffe@riscstar.com>
Date: Sun, 8 Jun 2025 13:31:42 -0500
From: Alex Elder <elder@...cstar.com>
To: Haylen Chu <heylenay@....org>, Yixun Lan <dlan@...too.org>
Cc: mturquette@...libre.com, sboyd@...nel.org, inochiama@...look.com,
 linux-clk@...r.kernel.org, spacemit@...ts.linux.dev,
 linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
 Guodong Xu <guodong@...cstar.com>
Subject: Re: [PATCH] clk: spacemit: mark K1 pll1_d8 as critical

On 6/7/25 11:46 PM, Haylen Chu wrote:
> On Sat, Jun 07, 2025 at 09:46:03PM -0500, Alex Elder wrote:

I respond below.  I'm leaving all the context for now.

>> On 6/7/25 7:24 PM, Yixun Lan wrote:
>>> Hi Alex,
>>>
>>> On 15:27 Sat 07 Jun     , Alex Elder wrote:
>>>> The pll1_d8 clock is enabled by the boot loader, and is ultimately a
>>>> parent for numerous clocks.  Guodong Xu was recently testing DMA,
>>>               ~~~~~~~~~this is still vague, numerous isn't equal to critical
>>
>> I will give you a full explanation of what I observed, below.
>>
>>>> adding a reset property, and discovered that the needed reset was
>>>> not yet ready during initial probe.  It dropped its clock reference,
>>>> which dropped parent references, and along the way it dropped the sole
>>>> reference to pll1_d8 (from its prior clk_get()).  Clock pll1_d8 got
>>>> disabled, which resulted in a non-functioning system.
>>>>
>>> So, I'm trying to understand the problem, and would like to evaluate if
>>> the "critical" flag is necessary..
>>>
>>> It occurs to me, the DMA driver should request and enable clock first,
>>> then request and issue a reset, it probably could be solved by proper
>>
>> No, that is not the issue.  The reset is never deasserted.
>>
>>> order? so what's the real problem here? is DMA or reset? dropped the
>>> clock? or does driver fail to request a reset before clock is ready?
>>
>> The problem is with the pll1_d8 clock.  That clock is enabled
>> successfully.  However the reset isn't ready, so the clock
>> gets disabled, and its parent (and other ancestors) also get
>> disabled while recovering from that.
>>
>> I'll give you a high-level summary, then will lay out a ton of
>> detail.
>>
>> In the DMA driver probe function, several initial things happen
>> and then, a clock is requested (enabled):
>>    <&syscon_apmu CLK_DMA>
>> That succeeds.
>>
>> Next, a reset is requested:
>>    <&syscon_apmu RESET_DMA>
>> But that fails, because the reset driver probe function hasn't
>> been called yet.  The request gets -EPROBE_DEFER as its result,
>> and the DMA driver starts unwinding everything so that it can
>> be probed again later.  Dropping the clock reference results
>> in parent clocks dropping references.  And because pll1_div8
>> initially had a reference count of 0 (despite being on),
>> dropping this single reference means it gets disabled.  Then
>> we're stuck.
>>
>>
>> Here is how the DMA clock is supplied (at boot time):
>>
>> pll1 -> pll1_d8 -> pll1_d8_307p2 -> pmua_aclk -> dma_clk
>>
>> pll1 and pll1_d8 are enabled by the boot loader, but at this
>> time the drivers for various hardware that require them aren't
>> "getting" and enabling them (yet).
>>
>> devm_clk_get_optional_enabled() causes clk_prepare_enable()
>> to be called on the target clock (pll1_d8).  That simply calls
>> clk_prepare() and clk_enable().  Let's focus on the latter.
>>      clk_enable(dma_clk)
>>        clk_core_enable_lock()
>>
>> So now the clock enable lock is held.  The target clock's
>> enable_count for pll1_d8 is 0.
>>
>>    clk_core_enable(dma_clk)
>>      clk_core_enable(parent = pmua_aclk)
>>      ...
>>      enable_count++ (on dma_clk)
>>
>> The parent gets enabled (I'm fairly certain pmua_clk's
>> enable_count is also 0).
>>
>>    clk_core_enable(pmua_aclk)
>>      clk_core_enable(parent = pll1_d8_307p2)
>>      ...
>>      enable_count++ (on pmua_clk)
>>
>> And so on.  When the clk_enable(dma_clk) completes, we have
>> these enable counts:
>>    dma_clk:		1
>>    pmua_clk:		1
>>    pll1_d8_307p2:	1
>>    pll1_d8:		1
>>    pll1:			1? (I don't recall)
>>
>> The -EPROBE_DEFER causes the  devm_clk_get_optional_enabled()
>> for dma_clk to get undone.  That means clk_disable_unprepare()
>> gets called on dma_clk.  Let's just focus on clk_disable().
>>
>>    clk_disable(dma_clk)
>>      clk_core_disable_lock(dma_clk)
>>        (takes clk_enable lock)
>>        clk_core_disable()
>>          --enable_count becomes 0 (on dma_clk)
>>          (disables dma_clk)
>>          clk_core_disable(core->parent = pmua_aclk)
>>
>>    clk_core_disable(pmua_aclk)
>>      --enable_count becomes 0 (on pmua_clk)
>>      (disables pmua_clk)
>>      clk_core_dissable(core->parent = pll1_d8_307p2)
>>
>>    clk_core_disable(pll1_d8_307p2)
>>      --enable_count becomes 0 (on pll1_d8_307p2)
>>      (disables pll1_d8_307p2)
>>      clk_core_dissable(core->parent = pll1_d8)
>>
>>    clk_core_disable(pll1_d8\)
>>      --enable_count becomes 0 (on pll1)
>>      (disables pll1_d8)
>>      BOOM
> 
> Yeah, I got the reason that pll1_d8 is disabled, but I don't still
> understand why it matters: pll1_d8 is a simple factor gate, though being
> parent of various peripheral clocks, it's okay to enable it again later
> if some peripherals decide to use it or one of its child, isn't it?

When it gets disabled, the system becomes non-functional.  What that
means is that everything stops, no more output, and even when I
enabled KDB it did not drop into KDB.  *Something* depends on it,
but there is no driver that properly enables the clock (yet).
This means--for now--it seems to be a critical clock.

> I could come up with several scenarios where disabling the clock really
> causes problems,
> 
> 1. Some essential SoC components are actually supplied by pll1_d8 or one
>     of its children, but isn't described in devicetree or the driver,
>     thus disabling pll1_d8 leads to an SoC hang. We should mark the

This is what I believe is happening.  So my fix marks the one clock
as critical in the driver.  I would not want to mark it critical
in DT.

>     precise clock that being essential as critcal, instead of setting
>     pll1_d8 as critical to work around the problem.

I provided the chain of clocks leading to the dma_clk.  Disabling
the dma_clk did not cause harm; disabling pmua_aclk did not cause
harm; disabling pll1_d8_307p2 did not cause harm.  Only when pll1_d8
was disabled did the machine become unusable.

> 2. There're bugs in our clock driver, thus it fails to bring pll1_d8
>     (or maybe also its children) back to a sound state. If so we should
>     fix the driver.

No, I found that pll1_d8 had an enable count of 1 after dma_clk
was enabled.  And then the set of disables that resulted from the
-EPROBE_DEFER discovered its enable count was zero, and therefore
disabled pll1_d8 (after dma_clk, pmua_aclk, and pll1_d8_307p2 were
disabled).  I do not suspect the clock *driver* is at fault.

> Unless you could confirm pll1_d8 (not its child) really supplies some
> essential SoC components, I don't think simply marking pll1_d8 as
> critical is the correct solution.

I did confirm this.  The child gets disabled before the parent in
clk_core_disable().

> And, I don't even know what "non-functioning system" behaves like. Could
> you please provide more information, like soC behaviours or dmesg when
> disabling pll1_d8 causes problems? Thanks.

Maybe you could, as an experiment, pretend <&syscon_apmu CLK_DMA>
is a clock for something and get it during probe, using
devm_clk_get_optional_enabled().  Then just return -EPROBE_DEFER
after that, and I think you'll reproduce the issue.  In fact,
you might hit the issue more quickly if you used CLK_PLL1_D8.

					-Alex

>> I hope this is clear.
>>
>> 					-Alex
> 
> Regards,
> Haylen Chu
> 
>>
>>>> Mark that clock critical so it doesn't get turned off in this case.
>>>> We might be able to turn this flag off someday, but for now it
>>>> resolves the problem Guodong encountered.
>>>>
>>>> Define a new macro CCU_FACTOR_GATE_DEFINE() to allow clock flags to
>>>> be supplied for a CCU_FACTOR_GATE clock.
>>>>
>>>> Fixes: 1b72c59db0add ("clk: spacemit: Add clock support for SpacemiT K1 SoC")
>>>> Signed-off-by: Alex Elder <elder@...cstar.com>
>>>> Tested-by: Guodong Xu <guodong@...cstar.com>
>>>> ---
>>>>    drivers/clk/spacemit/ccu-k1.c  |  3 ++-
>>>>    drivers/clk/spacemit/ccu_mix.h | 21 +++++++++++++--------
>>>>    2 files changed, 15 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/clk/spacemit/ccu-k1.c b/drivers/clk/spacemit/ccu-k1.c
>>>> index cdde37a052353..df65009a07bb1 100644
>>>> --- a/drivers/clk/spacemit/ccu-k1.c
>>>> +++ b/drivers/clk/spacemit/ccu-k1.c
>>>> @@ -170,7 +170,8 @@ CCU_FACTOR_GATE_DEFINE(pll1_d4, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(3), 4,
>>>>    CCU_FACTOR_GATE_DEFINE(pll1_d5, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(4), 5, 1);
>>>>    CCU_FACTOR_GATE_DEFINE(pll1_d6, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(5), 6, 1);
>>>>    CCU_FACTOR_GATE_DEFINE(pll1_d7, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(6), 7, 1);
>>>> -CCU_FACTOR_GATE_DEFINE(pll1_d8, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(7), 8, 1);
>>>> +CCU_FACTOR_GATE_FLAGS_DEFINE(pll1_d8, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(7), 8, 1,
>>>> +		CLK_IS_CRITICAL);
>>>>    CCU_FACTOR_GATE_DEFINE(pll1_d11_223p4, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(15), 11, 1);
>>>>    CCU_FACTOR_GATE_DEFINE(pll1_d13_189, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(16), 13, 1);
>>>>    CCU_FACTOR_GATE_DEFINE(pll1_d23_106p8, CCU_PARENT_HW(pll1), APBS_PLL1_SWCR2, BIT(20), 23, 1);
>>>> diff --git a/drivers/clk/spacemit/ccu_mix.h b/drivers/clk/spacemit/ccu_mix.h
>>>> index 51d19f5d6aacb..668c8139339e1 100644
>>>> --- a/drivers/clk/spacemit/ccu_mix.h
>>>> +++ b/drivers/clk/spacemit/ccu_mix.h
>>>> @@ -101,16 +101,21 @@ static struct ccu_mix _name = {							\
>>>>    	}									\
>>>>    }
>>>> +#define CCU_FACTOR_GATE_FLAGS_DEFINE(_name, _parent, _reg_ctrl, _mask_gate, _div,	\
>>>> +			       _mul, _flags)					\
>>>> +struct ccu_mix _name = {							\
>>>> +	.gate	= CCU_GATE_INIT(_mask_gate),					\
>>>> +	.factor	= CCU_FACTOR_INIT(_div, _mul),					\
>>>> +	.common = {								\
>>>> +		.reg_ctrl	= _reg_ctrl,					\
>>>> +		CCU_MIX_INITHW(_name, _parent, spacemit_ccu_factor_gate_ops, _flags)	\
>>>> +	}									\
>>>> +}
>>>> +
>>>>    #define CCU_FACTOR_GATE_DEFINE(_name, _parent, _reg_ctrl, _mask_gate, _div,	\
>>>>    			       _mul)						\
>>>> -static struct ccu_mix _name = {							\
>>>> -	.gate	= CCU_GATE_INIT(_mask_gate),					\
>>>> -	.factor	= CCU_FACTOR_INIT(_div, _mul),					\
>>>> -	.common = {								\
>>>> -		.reg_ctrl	= _reg_ctrl,					\
>>>> -		CCU_MIX_INITHW(_name, _parent, spacemit_ccu_factor_gate_ops, 0)	\
>>>> -	}									\
>>>> -}
>>>> +	CCU_FACTOR_GATE_FLAGS_DEFINE(_name, _parent, _reg_ctrl, _mask_gate, _div,	\
>>>> +			       _mul, 0)
>>>>    #define CCU_MUX_GATE_DEFINE(_name, _parents, _reg_ctrl, _shift, _width,		\
>>>>    			    _mask_gate, _flags)					\
>>>> -- 
>>>> 2.45.2
>>>>
>>>
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ