lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d327fc1a-0db4-4fcc-aed6-ded53fa28b62@amd.com>
Date: Sat, 24 Jan 2026 12:43:02 -0600
From: Mario Limonciello <mario.limonciello@....com>
To: Hamza Mahfooz <someguy@...ective-light.com>
Cc: Timur Kristóf <timur.kristof@...il.com>,
 dri-devel@...ts.freedesktop.org, Christian König
 <christian.koenig@....com>, Alex Deucher <alexander.deucher@....com>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Harry Wentland <harry.wentland@....com>, Leo Li <sunpeng.li@....com>,
 Rodrigo Siqueira <siqueira@...lia.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 Sunil Khatri <sunil.khatri@....com>, Ce Sun <cesun102@....com>,
 Lijo Lazar <lijo.lazar@....com>, Kenneth Feng <kenneth.feng@....com>,
 Ivan Lipski <ivan.lipski@....com>, Alex Hung <alex.hung@....com>,
 Tom Chung <chiahsuan.chung@....com>, Melissa Wen <mwen@...lia.com>,
 Michel Dänzer <mdaenzer@...hat.com>,
 Fangzhi Zuo <Jerry.Zuo@....com>, amd-gfx@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] drm: introduce page_flip_timeout()



On 1/24/2026 12:32 PM, Hamza Mahfooz wrote:
> On Fri, Jan 23, 2026 at 04:30:12PM -0600, Mario Limonciello wrote:
>> Hamza - since you seem to have a "workload" that can run overnight and this
>> series recovers, can you try what Alex said and do a dc_suspend() and
>> dc_resume() for failure?
>>
>> Make sure you log a message so you can know it worked.
> 
> Sure, I'll try something along the lines of:

Generally speaking that looks good, but I'll leave a few comments.

> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> index 492457c86393..bc7abd00f5f4 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> @@ -579,11 +579,28 @@ amdgpu_dm_atomic_crtc_get_property(struct drm_crtc *crtc,
>   }
>   #endif
> 
> -static void amdgpu_dm_crtc_handle_timeout(struct drm_crtc *crtc)
> +static void amdgpu_dm_crtc_handle_timeout(struct drm_crtc *crtc,
> +					  struct drm_crtc_commit *commit)
>   {
>   	struct amdgpu_device *adev = drm_to_adev(crtc->dev);
>   	struct amdgpu_reset_context reset_ctx;
> +	struct amdgpu_ip_block *ip_block;
> 
> +	ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_DCE);
> +	if (!ip_block)
> +		goto full_reset;
> +
> +	ip_block->version->funcs->suspend(ip_block);
> +	ip_block->version->funcs->resume(ip_block);

Both of these can fail.  Especially considering the page flip timeout 
could be a DCN hang, I think you should check return code for both of 
them sequentially and jump to the full reset condition if either fails.

> +
> +	if (drm_crtc_commit_wait(commit)) {
> +		drm_err(crtc->dev, "suspend-resume failed!\n");
> +		goto full_reset;
> +	}
> +

At least to prove "this worked" you should log a message "right here" 
that the reset occurred and you recovered.  That "might not" be in the 
final version, but I think it's worth having for now.

> +	return;
> +
> +full_reset:
>   	if (amdgpu_device_should_recover_gpu(adev)) {
>   		memset(&reset_ctx, 0, sizeof(reset_ctx));
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 7175294ccb57..b38c4ee2fc95 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1961,7 +1961,7 @@ void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev,
>   				crtc->base.id, crtc->name);
> 
>   			if (crtc->funcs->page_flip_timeout)
> -				crtc->funcs->page_flip_timeout(crtc);
> +				crtc->funcs->page_flip_timeout(crtc, commit);
>   		}
>   	}
> 
> diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h
> index 45dc5a76e915..47a34a05f6de 100644
> --- a/include/drm/drm_crtc.h
> +++ b/include/drm/drm_crtc.h
> @@ -616,7 +616,8 @@ struct drm_crtc_funcs {
>   	 * and can be used by drivers to attempt to recover from a page flip
>   	 * timeout.
>   	 */
> -	void (*page_flip_timeout)(struct drm_crtc *crtc);
> +	void (*page_flip_timeout)(struct drm_crtc *crtc,
> +				  struct drm_crtc_commit *commit);
> 
>   	/**
>   	 * @set_property:


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ