[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eef1982a-ddff-4aea-8ece-5aa1995cc2ec@amd.com>
Date: Wed, 28 Jan 2026 13:48:08 +0100
From: Christian König <christian.koenig@....com>
To: Timur Kristóf <timur.kristof@...il.com>,
Alex Deucher <alexdeucher@...il.com>,
Hamza Mahfooz <someguy@...ective-light.com>,
Michel Dänzer <michel.daenzer@...lbox.org>
Cc: Mario Limonciello <mario.limonciello@....com>,
dri-devel@...ts.freedesktop.org, Alex Deucher <alexander.deucher@....com>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Harry Wentland <harry.wentland@....com>, Leo Li <sunpeng.li@....com>,
Rodrigo Siqueira <siqueira@...lia.com>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
Sunil Khatri <sunil.khatri@....com>, Ce Sun <cesun102@....com>,
Lijo Lazar <lijo.lazar@....com>, Kenneth Feng <kenneth.feng@....com>,
Ivan Lipski <ivan.lipski@....com>, Alex Hung <alex.hung@....com>,
Tom Chung <chiahsuan.chung@....com>, Melissa Wen <mwen@...lia.com>,
Michel Dänzer <mdaenzer@...hat.com>,
Fangzhi Zuo <Jerry.Zuo@....com>, amd-gfx@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] drm: introduce page_flip_timeout()
On 1/28/26 13:14, Timur Kristóf wrote:
> On Wednesday, January 28, 2026 12:26:20 PM Central European Standard Time
> Michel Dänzer wrote:
>> On 1/28/26 11:39, Christian König wrote:
>>> On 1/27/26 23:57, Alex Deucher wrote:
>>>> On Tue, Jan 27, 2026 at 5:53 PM Hamza Mahfooz
>>>>
>>>> <someguy@...ective-light.com> wrote:
>>>>> On Mon, Jan 26, 2026 at 09:20:55AM -0500, Alex Deucher wrote:
>>>>>> I suspect just calling drm_crtc_send_vblank_event() here on the
>>>>>> relevant crtcs would be enough.
>>>>>
>>>>> Seems like an interesting idea, though I would imagine we would still
>>>>> want to attempt a reset (of some kind) assuming that the subsequent page
>>>>> flip also experiences a timeout.
>>>>
>>>> Is it actually a timeout or just missed interrupts? I'm wondering if
>>>> some power feature races with the modeset and causes the interrupt to
>>>> get missed from time to time.
>>>
>>> That is my strong suspicion as well.
>>>
>>> Even if we missed a vblank interrupt that thing is reoccurring, so the
>>> worst thing that can happen is that we delayed reporting back success by
>>> one frame.
>>>
>>> So something must have turned the CRTC fully off.
>>
>> Not sure that's a generally valid conclusion (do the gitlab issues talk
>> about the display going black, or about it staying on but freezing?).
>
> In all the bug reports I've seen about page flip timeouts, and in all the
> timeouts I've seen on my machine, the screen remains on, but frozen.
> It doesn't go black and doesn't turn off.
>
> Christian, why would the CRTC be turned off?
Exactly that's the question we need to answer.
But from what you describe the CRTC keeps on, just doesn't send any more vblank events.
>> AFAIR
>> at least in some cases amdgpu uses a dedicated "page flip" interrupt
>> instead of the vblank interrupt,
Oh really good point! I haven't though about the dedicated page flip interrupt.
But IIRC we already had problems with that one with radeon, so we stopped using it a long long time ago.
> That matches what I saw when I was digging in the code.
>
>> in which case missing a single interrupt
>> could cause a timeout.
>>
>>
>> P.S. Completing the atomic commit and sending the completion event must work
>> even if user space turns off any CRTCs as part of the commit[0].
Wait a second. What happens if we never complete that? So when the completion event is never signaled?
Does the kernel then reject any new atomic commit as well?
If yes then I think that is not defensive at all. In other words when you are right and the page flip interrupt is used and missed then we are stuck forever.
>> So your
>> hypothesis would be a kernel bug, accidentally turning off the CRTC and/or
>> not handling a CRTC getting turned off correctly.
I'm not arguing that it isn't a kernel bug, but the question is what is triggering it?
In other words could it be that userspace does something illegal which the kernel fails to reject?
Regards,
Christian.
>> [0] If any CRTC for which the commit has state is off both before and after
>> the commit though, the commit fails with an error before it could result in
>> a timeout.
Powered by blists - more mailing lists