linux-kernel - Re: [PATCH v1 4/4] clk: thead: Add GPU clock gate control with CLKGEN reset support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c035603-4c11-4e71-8ef3-b857a81bf5ef@samsung.com>
Date: Thu, 6 Mar 2025 17:43:57 +0100
From: Michal Wilczynski <m.wilczynski@...sung.com>
To: Stephen Boyd <sboyd@...nel.org>, alex@...ti.fr, aou@...s.berkeley.edu,
	conor+dt@...nel.org, drew@...7.com, guoren@...nel.org, jszhang@...nel.org,
	krzk+dt@...nel.org, m.szyprowski@...sung.com, mturquette@...libre.com,
	p.zabel@...gutronix.de, palmer@...belt.com, paul.walmsley@...ive.com,
	robh@...nel.org, wefu@...hat.com
Cc: linux-clk@...r.kernel.org, devicetree@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org
Subject: Re: [PATCH v1 4/4] clk: thead: Add GPU clock gate control with
 CLKGEN reset support



On 3/6/25 00:47, Stephen Boyd wrote:
> Quoting Michal Wilczynski (2025-03-03 06:36:29)
>> The T-HEAD TH1520 has three GPU clocks: core, cfg, and mem. The mem
>> clock gate is marked as "Reserved" in hardware, while core and cfg are
>> configurable. In order for these clock gates to work properly, the
>> CLKGEN reset must be managed in a specific sequence.
>>
>> Move the CLKGEN reset handling to the clock driver since it's
>> fundamentally a clock-related workaround [1]. This ensures that clk_enabled
>> GPU clocks stay physically enabled without external interference from
>> the reset driver.  The reset is now deasserted only when both core and
>> cfg clocks are enabled, and asserted when either of them is disabled.
>>
>> The mem clock is configured to use nop operations since it cannot be
>> controlled.
>>
>> Link: https://lore.kernel.org/all/945fb7e913a9c3dcb40697328b7e9842b75fea5c.camel@pengutronix.de [1]
>>
>> Signed-off-by: Michal Wilczynski <m.wilczynski@...sung.com>
> [...]
>> diff --git a/drivers/clk/thead/clk-th1520-ap.c b/drivers/clk/thead/clk-th1520-ap.c
>> index ea96d007aecd..1dfcde867233 100644
>> --- a/drivers/clk/thead/clk-th1520-ap.c
>> +++ b/drivers/clk/thead/clk-th1520-ap.c
>> @@ -862,17 +863,70 @@ static CCU_GATE(CLK_SRAM1, sram1_clk, "sram1", axi_aclk_pd, 0x20c, BIT(3), 0);
> [...]
>>  
>>  static CCU_GATE_CLK_OPS(CLK_GPU_MEM, gpu_mem_clk, "gpu-mem-clk",
>>                         video_pll_clk_pd, 0x0, BIT(2), 0, clk_nop_ops);
>> +static CCU_GATE_CLK_OPS(CLK_GPU_CORE, gpu_core_clk, "gpu-core-clk",
>> +                       video_pll_clk_pd, 0x0, BIT(3), 0, ccu_gate_gpu_ops);
>> +static CCU_GATE_CLK_OPS(CLK_GPU_CFG_ACLK, gpu_cfg_aclk, "gpu-cfg-aclk",
>> +                       video_pll_clk_pd, 0x0, BIT(4), 0, ccu_gate_gpu_ops);
>> +
>> +static void ccu_gpu_clk_disable(struct clk_hw *hw)
>> +{
>> +       struct ccu_gate *cg = hw_to_ccu_gate(hw);
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&gpu_reset_lock, flags);
>> +
>> +       ccu_disable_helper(&cg->common, cg->enable);
>> +
>> +       if ((cg == &gpu_core_clk &&
>> +            !clk_hw_is_enabled(&gpu_cfg_aclk.common.hw)) ||
>> +           (cg == &gpu_cfg_aclk &&
>> +            !clk_hw_is_enabled(&gpu_core_clk.common.hw)))
>> +               reset_control_assert(gpu_reset);
> 
> Why can't the clk consumer control the reset itself? Doing this here is
> not ideal because we hold the clk lock when we try to grab the reset
> lock. These are all spinlocks that should be small in lines of code
> where the lock is held, but we're calling into an entire other framework
> under a spinlock. If an (unrelated) reset driver tries to grab the clk
> lock it will deadlock.

So in our case the clk consumer is the drm/imagination driver. Here is
the comment from the maintainer for my previous attempt to use a reset
driver to abstract the GPU init sequence [1]:

"Do you know what this resets? From our side, the GPU only has a single
reset line (which I assume to be GPU_RESET)."

"I don't love that this procedure appears in the platform reset driver.
I appreciate it may not be clear from the SoC TRM, but this is the
standard reset procedure for all IMG Rogue GPUs. The currently
supported TI SoC handles this in silicon, when power up/down requests
are sent so we never needed to encode it in the driver before.

Strictly speaking, the 32 cycle delay is required between power and
clocks being enabled and the reset line being deasserted. If nothing
here touches power or clocks (which I don't think it should), the delay
could potentially be lifted to the GPU driver." 

>From the drm/imagination maintainers point of view their hardware has
only one reset, the extra CLKGEN reset is SoC specific. 

Also the reset driver maintainer didn't like my way of abstracting two
resets ("GPU" and and SoC specific"CLKGEN") into one reset to make it
seem to the consumer driver drm/imagination like there is only one
reset and suggested and attempt to code the re-setting in the clock
driver [2]. Even though he suggested a different way of achieving that: 

"In my mind it shouldn't be much: the GPU clocks could all share the
same refcounted implementation. The first clock to get enabled would
ungate both GPU_CORE and GPU_CFG_ACLK gates and deassert
GPU_SW_CLKGEN_RST, all in one place. The remaining enable(s) would be
no-ops. Would that work?"

The above would have similar effect, but I felt like enabling both
clocks in single .enable callback could be confusing so I ended up with
the current approach. This could be easily re-done if you feel this
would be better.

I agree that using spinlocks here is dangerous, but looking at the code
of the reset_control_deassert and reset_control_assert, it doesn't seem
like any spinlocks are acquired/relased in that code flow, unless the
driver ops would introduce that. So in this specific case deadlock
shouldn't happen ?

[1] - https://lore.kernel.org/all/816db99d-7088-4c1a-af03-b9a825ac09dc@imgtec.com/
[2] - https://lore.kernel.org/all/945fb7e913a9c3dcb40697328b7e9842b75fea5c.camel@pengutronix.de/

> 
> I see the commit text talks about this being a workaround. I'm not sure
> what the workaround is though. I've seen designs where the reset doesn't
> work unless the clk is enabled because the flops have to be clocking for
> the reset to propagate a few cycles, or the clk has to be disabled so
> that the reset controller can do the clocking, or vice versa for the clk
> not working unless the reset is deasserted. Long story short, it's
> different between SoCs.

OK, so this is how GPU initialization needs to happen for this specific
SoC:

drm/imagination consumer driver:

pvr_power_device_resume():

clk_prepare_enable(pvr_dev->core_clk);
clk_prepare_enable(pvr_dev->sys_clk);
clk_prepare_enable(pvr_dev->mem_clk);

// Then deassert reset introduced in [3]
// [3] - https://lore.kernel.org/all/20250128194816.2185326-11-m.wilczynski@samsung.com/

// Eventually this would get called in the SoC specific reset driver
static void th1520_rst_gpu_enable(struct regmap *reg,
				  struct mutex *gpu_seq_lock)
{
	int val;

	mutex_lock(gpu_seq_lock);

	/* if the GPU is not in a reset state it, put it into one */
	regmap_read(reg, TH1520_GPU_RST_CFG, &val);
	if (val)
		regmap_update_bits(reg, TH1520_GPU_RST_CFG,
				   TH1520_GPU_RST_CFG_MASK, 0x0);

	/* rst gpu clkgen */
	regmap_set_bits(reg, TH1520_GPU_RST_CFG, TH1520_GPU_SW_CLKGEN_RST);

	/*
	 * According to the hardware manual, a delay of at least 32 clock
	 * cycles is required between de-asserting the clkgen reset and
	 * de-asserting the GPU reset. Assuming a worst-case scenario with
	 * a very high GPU clock frequency, a delay of 1 microsecond is
	 * sufficient to ensure this requirement is met across all
	 * feasible GPU clock speeds.
	 */
	udelay(1);

	/* rst gpu */
	regmap_set_bits(reg, TH1520_GPU_RST_CFG, TH1520_GPU_SW_GPU_RST);

	mutex_unlock(gpu_seq_lock);
}


Based on my testing this is exactly how the resets should happen, else
the GPU fails to initialize, and the drm/imagination driver hangs. To
reiterate: first power ON the GPU using power-domain driver. Then
drm/imagination enable all three clocks, then deassert reset of the GPU
CLKGEN (SoC specific), delay for 32 cycles, and then deassert the GPU reset.

> 
> Likely the reset and clk control should be coordinated in a PM domain
> for the device so that when the device is active, the clks are enabled
> and the reset is deasserted in the correct order for the SoC. Can you do
> that?

Obviously this would work, but I'm worried, the drm/imagination driver
maintainers would reject it, as for them this is a special case, from
their perspective there is only one reset line to their hardware, and
there are three clocks which they manage in driver already. And the PM
maintainers probably wouldn't be happy to take this as well.

At the very end maybe we could go back to abstracting the resets in the
reset driver, as the reset maintainer seemed to be open to it, if the
alternatives proves to be too problematic [4].

"I won't object to carry this in the reset driver if the clock
implementation turns out to be unreasonably complex, but I currently
don't expect that to be the case. Please give it a shot."

[4] - https://lore.kernel.org/all/945fb7e913a9c3dcb40697328b7e9842b75fea5c.camel@pengutronix.de/

> 
>> +
>> +       spin_unlock_irqrestore(&gpu_reset_lock, flags);
>> +}
>