[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2fe8187-9271-4cbf-8b7e-37ffda0799de@amd.com>
Date: Wed, 28 Jan 2026 12:25:31 +0100
From: Christian König <christian.koenig@....com>
To: Timur Kristóf <timur.kristof@...il.com>,
Michel Dänzer <michel.daenzer@...lbox.org>,
Hamza Mahfooz <someguy@...ective-light.com>, dri-devel@...ts.freedesktop.org
Cc: Alex Deucher <alexander.deucher@....com>, David Airlie
<airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Harry Wentland <harry.wentland@....com>, Leo Li <sunpeng.li@....com>,
Rodrigo Siqueira <siqueira@...lia.com>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
Sunil Khatri <sunil.khatri@....com>, Ce Sun <cesun102@....com>,
Lijo Lazar <lijo.lazar@....com>, Kenneth Feng <kenneth.feng@....com>,
Ivan Lipski <ivan.lipski@....com>, Alex Hung <alex.hung@....com>,
Tom Chung <chiahsuan.chung@....com>, Melissa Wen <mwen@...lia.com>,
Michel Dänzer <mdaenzer@...hat.com>,
Fangzhi Zuo <Jerry.Zuo@....com>, amd-gfx@...ts.freedesktop.org,
linux-kernel@...r.kernel.org, Mario Limonciello <mario.limonciello@....com>
Subject: Re: [PATCH 1/2] drm: introduce page_flip_timeout()
On 1/28/26 10:19, Timur Kristóf wrote:
> On 2026. január 26., hétfő 14:00:59 közép-európai téli idő Christian König
> wrote:
>> On 1/26/26 11:27, Michel Dänzer wrote:
>>> On 1/26/26 11:14, Christian König wrote:
>>>> On 1/23/26 15:44, Timur Kristóf wrote:
>>>>> On Friday, January 23, 2026 2:52:44 PM Central European Standard Time
>>>>>
>>>>> Christian König wrote:
>>>>>> So as far as I can see the whole approach doesn't make any sense at
>>>>>> all.
>>>>>
>>>>> Actually this approach was proposed as a solution at XDC 2025 in Harry's
>>>>> presentation, "DRM calls driver callback to attempt recovery", see page
>>>>> 9 in this slide deck:
>>>>>
>>>>> https://indico.freedesktop.org/event/10/contributions/431/attachments/
>>>>> 267/355/2025%20XDC%20Hackfest%20Update%20v1.2.pdf
>>>>>
>>>>> If you disagree with Harry, please make a counter-proposal.
>>>>
>>>> Well I must have missed that detail otherwise I would have objected.
>>>>
>>>> But looking at the slide Harry actually pointed out what immediately came
>>>> to my mind as well, e.g. that the Compositor needs to issue a full
>>>> modeset to re-program the CRTC.>
>>> In principle, the kernel driver has all the information it needs to
>>> reprogram the HW by itself. Not sure why the compositor would need to be
>>> actively involved.
>> Well first of all I'm not sure if we can reprogram the HW even if all
>> information are available.
>>
>> Please keep in mind that we are in a dma_fence timeout handler here with the
>> usual rat tail of consequences. So no allocation of memory or taking locks
>> under which memory is allocated or are part of preparing the page flip
>> etc... I'm not so deep in the atomic code, so Alex, Sima and probably you
>> as well can answer that much better than I do, but of hand it sounds
>> questionable.
>>
>> On the other hand we could of course postpone reprogramming the CRTC into an
>> async work item, but that might created more problems then it solves.
>>
>> Then second even if the kernel can do it I'm not sure if it should do it.
>>
>> I mean userspace asked for a quick page flip and not some expensive CRTC/PLL
>> reprogramming. Stuff like that usually takes some time and by then the
>> frame which should be displayed by the page flip might already be stale and
>> it would be better to tell userspace that we couldn't display it on time
>> and wait for a new frame to be generated.
>
> I agree with Michel here. It's a kernel bug, it should be solved by the
> kernel. I don't like the tendency of pushing userspace to handle kernel bugs,
> especially if this is just needed for one vendor's buggy driver. (No offence.)
Well I strongly disagree. The kernel is not here to serve userspace, but to give userspace access to the HW in a generalized manner.
If this is caused by a HW failure then reporting back to userspace is the most reasonable thing to do.
>>
>> And third, there must be a root cause of the page flip not completing.
>>
>> My educated guess is that we have some atomic property change or even
>> turning the CRTC off in parallel with the page flip. I mean HW rarely turns
>> off its reoccurring vblank interrupt on its own.
>
> I think a page flip timeout can many root causes, so it's unlikely that a
> single solution would solve all of it.
That's actually what I'm not sure about. I mean what seems to happen consistently is that somehow the CRTC is turned off, who is doing that and why?
As far as I can see drm_atomic_helper_wait_for_flip_done() is called by amdgpu_dm_atomic_commit_tail() to wait for the page flip to finish. So the message comes from our DM/DC code calling into the DRM code.
That in turn makes the callback completely superfluous, we could just signal an error back from drm_atomic_helper_wait_for_flip_done().
Additional to that I don't see much locking here. Who is preventing multiple atomic changes from happening at the same time?
Regards,
Christian.
>
> See:
> https://gitlab.freedesktop.org/drm/amd/-/issues?
> label_name[]=page%20flip%20timeout
>
> Best regards,
> Timur
>
>
>
Powered by blists - more mailing lists