lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <360b8654-01be-4f47-90eb-4fdb2055c653@arm.com>
Date: Fri, 24 Oct 2025 14:02:52 +0100
From: Steven Price <steven.price@....com>
To: Karunika Choo <karunika.choo@....com>, dri-devel@...ts.freedesktop.org
Cc: nd@....com, Boris Brezillon <boris.brezillon@...labora.com>,
 Liviu Dudau <liviu.dudau@....com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 06/10] drm/panthor: Implement L2 power on/off via
 PWR_CONTROL

On 24/10/2025 12:51, Karunika Choo wrote:
> On 24/10/2025 10:43, Steven Price wrote:
>> On 23/10/2025 23:16, Karunika Choo wrote:
>>> On 20/10/2025 11:50, Steven Price wrote:
>>>> On 14/10/2025 10:43, Karunika Choo wrote:
>>>>> This patch adds common helpers to issue power commands, poll
>>>>> transitions, and validate domain state, then wires them into the L2
>>>>> on/off paths.
>>>>>
>>>>> The L2 power-on sequence now delegates control of the SHADER and TILER
>>>>> domains to the MCU when allowed, while the L2 itself is never delegated.
>>>>> On power-off, dependent domains beneath the L2 are checked, and if
>>>>> necessary, retracted and powered down to maintain proper domain
>>>>> ordering.
>>>>>
>>>>> Signed-off-by: Karunika Choo <karunika.choo@....com>
>>>>> ---
>> [...]
>>>>> +		u64 domain_ready = gpu_read64(ptdev, get_domain_ready_reg(child_domain));
>>>>> +
>>>>> +		if (domain_ready && (pwr_status & PWR_STATUS_DOMAIN_DELEGATED(child_domain))) {
>>>>> +			drm_warn(&ptdev->base,
>>>>> +				 "L2 power off: Delegated %s domain not powered down by MCU",
>>>>> +				 get_domain_name(child_domain));
>>>>> +			ret = retract_domain(ptdev, child_domain);
>>>>> +			if (ret) {
>>>>> +				drm_err(&ptdev->base, "Failed to retract %s domain",
>>>>> +					get_domain_name(child_domain));
>>>>> +				panthor_pwr_info_show(ptdev);
>>>>> +				return ret;
>>>>> +			}
>>>>> +		}
>>>>> +
>>>>> +		ret = panthor_pwr_domain_power_off(ptdev, child_domain, domain_ready,
>>>>> +						   PWR_TRANSITION_TIMEOUT_US);
>>>>> +		if (ret)
>>>>> +			return ret;
>>>>> +	}
>>>>> +
>>>>> +	return panthor_pwr_domain_power_off(ptdev, PWR_COMMAND_DOMAIN_L2,
>>>>> +					    ptdev->gpu_info.l2_present,
>>>>> +					    PWR_TRANSITION_TIMEOUT_US);
>>>>
>>>> Does this implicitly 'retract' the shader/tiler power domains? If so I
>>>> think it's worth a comment. Otherwise it looks like we don't actually
>>>> know the status of whether the shader/tiler power domains are retracted
>>>> or not.
>>>>
>>>
>>> panthor_pwr_l2_power_off() will only retract the shader/tiler domains if
>>> they have not been powered down by the MCU. In cases where the MCU did
>>> power down these child domains, delegate_domain() will exit early as
>>> they would already be delegated. I understand the ambiguity here,
>>> hopefully it is somewhat acceptable.
>>
>> So my question was really how does the driver know whether the domains
>> are delegated or not when this function returns?
>>
>> I couldn't quite get my head around whether turning the L2 power domain
>> off would implicitly 'retract' the shader/tiler power domains -
>> obviously it forces them off which means the MCU doesn't have control.
>> So retracting would make sense, but I couldn't see anything in the spec.
>>
>> It would be good to have a comment explaining what the expected state is
>> after this function (panthor_pwr_l2_power_off) returns. Is it unknown
>> whether the shader/tiler are retracted, or is there something in the
>> hardware which does this automatically so we know but don't have to
>> manually retract? Presumably if we end up fully powering down the GPU
>> that must effectively retract all domains (the GPU hardware is reset so
>> it goes back to reset conditions).
>>
>> Sorry, it's a bit of a basic question but the spec is somewhat unhelpful
>> on this point! (Or at least I haven't found a relevant statement).
>>
> 
> Powering off the L2 does not automatically retract its child domains.
> The above case is for handling the edge case where the MCU is hung and
> is not able to power off the delegated domains, therefore the host needs
> to take over and power them down before turning off the L2. Additionally,
> like you have alluded to, powering off the GPU will inevitably reset all
> of these states (retracting the child domains), necessitating a
> re-delegation on L2 power on.
> 
> Therefore, the typical operation loop will be as follows:
>  1. L2 power on
>  2. Delegate Tiler/Shader
>  <suspend>
>  3. Halt MCU (should power down Tiler/Shader)
>  4. L2 power off (no retract of Tiler/Shader)
>  <resume>
>  5. L2 power on (next resume)
>  6. Delegate Tiler/Shader (skipped as already delegated)
> 
> If the MCU is hung:
>  1. L2 power on
>  2. Delegate Tiler/Shader
>  <suspend>
>  3. Halt MCU fails
>  4. L2 power off (Retract and power off Shader/Tiler)
>  <resume>
>  5. L2 power on
>  6. Delegate Tiler/Shader
> 
> If the GPU is turned off between suspend and resume:
>  1. L2 power on
>  2. Delegate Tiler/Shader
>  <suspend>
>  3. Halt MCU (should power down Tiler/Shader)
>  4. L2 power off (no retract of Tiler/Shader)
>  <GPU turned off>
>  <resume>
>  6. L2 power on
>  7. Delegate Tiler/Shader

Thanks for the explanation!

> 
> With the current implementation, we cannot expect it to be always
> retracted on return of the function, but it does provide the
> additional benefit that on resume we don't need to go through the
> whole delegate cycle after powering up the L2, allowing us to
> save some time there.
> 
> On the other hand, if we want to explicitly enforce that we retract on 
> suspend, then we have to accept the additional cost to delegate the
> domains on resume.

No, there's no need to change it. But I think it's worth a comment that
in the usual case (the MCU isn't hung) the shader/tiler are left
delegated and the attempt to delegate them again will detect this and
skip it.

I think what mostly confused me is that delegate_domain() has the following:

> +	if (pwr_status_reg & delegated_mask) {
> +		drm_dbg(&ptdev->base, "%s domain already delegated",
> +			get_domain_name(domain));
> +		return 0;
> +	}

Although it's "drm_dbg" that message makes it seem like this is an
unexpected situation. Whereas actually we would normally expect that to
happen during a resume (as long as the GPU remains powered). With that
in my head I then started to think that there might be something in the
hardware causing an automatic "retract".

Thanks,
Steve


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ