[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXUiO9EyYgy8dcW8@hal-station>
Date: Sat, 24 Jan 2026 14:49:15 -0500
From: Hamza Mahfooz <someguy@...ective-light.com>
To: Mario Limonciello <mario.limonciello@....com>
Cc: Timur Kristóf <timur.kristof@...il.com>,
dri-devel@...ts.freedesktop.org,
Christian König <christian.koenig@....com>,
Alex Deucher <alexander.deucher@....com>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Harry Wentland <harry.wentland@....com>,
Leo Li <sunpeng.li@....com>, Rodrigo Siqueira <siqueira@...lia.com>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>,
Sunil Khatri <sunil.khatri@....com>, Ce Sun <cesun102@....com>,
Lijo Lazar <lijo.lazar@....com>,
Kenneth Feng <kenneth.feng@....com>,
Ivan Lipski <ivan.lipski@....com>, Alex Hung <alex.hung@....com>,
Tom Chung <chiahsuan.chung@....com>, Melissa Wen <mwen@...lia.com>,
Michel Dänzer <mdaenzer@...hat.com>,
Fangzhi Zuo <Jerry.Zuo@....com>, amd-gfx@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] drm: introduce page_flip_timeout()
On Sat, Jan 24, 2026 at 12:43:02PM -0600, Mario Limonciello wrote:
>
>
> On 1/24/2026 12:32 PM, Hamza Mahfooz wrote:
> > On Fri, Jan 23, 2026 at 04:30:12PM -0600, Mario Limonciello wrote:
> > > Hamza - since you seem to have a "workload" that can run overnight and this
> > > series recovers, can you try what Alex said and do a dc_suspend() and
> > > dc_resume() for failure?
> > >
> > > Make sure you log a message so you can know it worked.
> >
> > Sure, I'll try something along the lines of:
>
> Generally speaking that looks good, but I'll leave a few comments.
>
> >
> > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> > index 492457c86393..bc7abd00f5f4 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> > @@ -579,11 +579,28 @@ amdgpu_dm_atomic_crtc_get_property(struct drm_crtc *crtc,
> > }
> > #endif
> >
> > -static void amdgpu_dm_crtc_handle_timeout(struct drm_crtc *crtc)
> > +static void amdgpu_dm_crtc_handle_timeout(struct drm_crtc *crtc,
> > + struct drm_crtc_commit *commit)
> > {
> > struct amdgpu_device *adev = drm_to_adev(crtc->dev);
> > struct amdgpu_reset_context reset_ctx;
> > + struct amdgpu_ip_block *ip_block;
> >
> > + ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_DCE);
> > + if (!ip_block)
> > + goto full_reset;
> > +
> > + ip_block->version->funcs->suspend(ip_block);
> > + ip_block->version->funcs->resume(ip_block);
>
> Both of these can fail. Especially considering the page flip timeout could
> be a DCN hang, I think you should check return code for both of them
> sequentially and jump to the full reset condition if either fails.
>
> > +
> > + if (drm_crtc_commit_wait(commit)) {
> > + drm_err(crtc->dev, "suspend-resume failed!\n");
> > + goto full_reset;
> > + }
> > +
>
> At least to prove "this worked" you should log a message "right here" that
> the reset occurred and you recovered. That "might not" be in the final
> version, but I think it's worth having for now.
I have included all of the suggestions in my test run, fingers crossed
that I don't have to wait too long for a repro though.
>
> > + return;
> > +
> > +full_reset:
> > if (amdgpu_device_should_recover_gpu(adev)) {
> > memset(&reset_ctx, 0, sizeof(reset_ctx));
> >
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index 7175294ccb57..b38c4ee2fc95 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1961,7 +1961,7 @@ void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev,
> > crtc->base.id, crtc->name);
> >
> > if (crtc->funcs->page_flip_timeout)
> > - crtc->funcs->page_flip_timeout(crtc);
> > + crtc->funcs->page_flip_timeout(crtc, commit);
> > }
> > }
> >
> > diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h
> > index 45dc5a76e915..47a34a05f6de 100644
> > --- a/include/drm/drm_crtc.h
> > +++ b/include/drm/drm_crtc.h
> > @@ -616,7 +616,8 @@ struct drm_crtc_funcs {
> > * and can be used by drivers to attempt to recover from a page flip
> > * timeout.
> > */
> > - void (*page_flip_timeout)(struct drm_crtc *crtc);
> > + void (*page_flip_timeout)(struct drm_crtc *crtc,
> > + struct drm_crtc_commit *commit);
> >
> > /**
> > * @set_property:
>
Powered by blists - more mailing lists