lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ff1b1825-c2ad-42f3-8910-d919fe627cc6@mailbox.org>
Date: Mon, 26 Jan 2026 15:31:12 +0100
From: Michel Dänzer <michel.daenzer@...lbox.org>
To: Christian König <christian.koenig@....com>,
 Timur Kristóf <timur.kristof@...il.com>,
 Hamza Mahfooz <someguy@...ective-light.com>, dri-devel@...ts.freedesktop.org
Cc: Alex Deucher <alexander.deucher@....com>, David Airlie
 <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Harry Wentland <harry.wentland@....com>, Leo Li <sunpeng.li@....com>,
 Rodrigo Siqueira <siqueira@...lia.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 Sunil Khatri <sunil.khatri@....com>, Ce Sun <cesun102@....com>,
 Lijo Lazar <lijo.lazar@....com>, Kenneth Feng <kenneth.feng@....com>,
 Ivan Lipski <ivan.lipski@....com>, Alex Hung <alex.hung@....com>,
 Tom Chung <chiahsuan.chung@....com>, Melissa Wen <mwen@...lia.com>,
 Fangzhi Zuo <Jerry.Zuo@....com>, amd-gfx@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org, Mario Limonciello <mario.limonciello@....com>
Subject: Re: [PATCH 1/2] drm: introduce page_flip_timeout()

On 1/26/26 14:00, Christian König wrote:
> On 1/26/26 11:27, Michel Dänzer wrote:
>> On 1/26/26 11:14, Christian König wrote:
>>> On 1/23/26 15:44, Timur Kristóf wrote:
>>>> On Friday, January 23, 2026 2:52:44 PM Central European Standard Time 
>>>> Christian König wrote:
>>>>
>>>>> So as far as I can see the whole approach doesn't make any sense at all.
>>>>
>>>> Actually this approach was proposed as a solution at XDC 2025 in Harry's 
>>>> presentation, "DRM calls driver callback to attempt recovery", see page 9 in 
>>>> this slide deck:
>>>>
>>>> https://indico.freedesktop.org/event/10/contributions/431/attachments/
>>>> 267/355/2025%20XDC%20Hackfest%20Update%20v1.2.pdf
>>>>
>>>> If you disagree with Harry, please make a counter-proposal.
>>>
>>> Well I must have missed that detail otherwise I would have objected.
>>>
>>> But looking at the slide Harry actually pointed out what immediately came to my mind as well, e.g. that the Compositor needs to issue a full modeset to re-program the CRTC.
>>
>> In principle, the kernel driver has all the information it needs to reprogram the HW by itself. Not sure why the compositor would need to be actively involved.
> 
> Well first of all I'm not sure if we can reprogram the HW even if all information are available.
> 
> Please keep in mind that we are in a dma_fence timeout handler here with the usual rat tail of consequences. So no allocation of memory or taking locks under which memory is allocated or are part of preparing the page flip etc... I'm not so deep in the atomic code, so Alex, Sima and probably you as well can answer that much better than I do, but of hand it sounds questionable.
> 
> On the other hand we could of course postpone reprogramming the CRTC into an async work item, but that might created more problems then it solves.

Seems doable offhand from a KMS UAPI PoV. The reprogramming just needs to be done before sending the atomic commit completion event(s) to user space.

Not sure about the DMA fence angle though. (I consider OUT_FENCE_PTR problematic for other reasons, in particular, using it to get a release fence for clients is kind of laying a trap for them. And in the compositor I see no benefit vs completion events)


> Then second even if the kernel can do it I'm not sure if it should do it.
> 
> I mean userspace asked for a quick page flip and not some expensive CRTC/PLL reprogramming.

More complex atomic commits can also hang, FWIW. In fact, they might be more likely to hang.


> Stuff like that usually takes some time and by then the frame which should be displayed by the page flip might already be stale and it would be better to tell userspace that we couldn't display it on time and wait for a new frame to be generated.

With my compositor developer hat on, I'd rather not spend effort generating a new frame if there is doubt that the kernel will actually be able to display it. The worst case of that would be constantly generating new frames, none of which are displayed.

I'd rather try again with the same frame, which boils down to an "empty" (no actual state changes) commit with the DRM_MODE_ATOMIC_ALLOW_MODESET flag.

Relying on user space for this can also be problematic, e.g. if user space dies and drops back to fbcon.


> And third, there must be a root cause of the page flip not completing.
> 
> My educated guess is that we have some atomic property change or even turning the CRTC off in parallel with the page flip. I mean HW rarely turns off its reoccurring vblank interrupt on its own.
> 
> Returning an error to userspace might actually help identify the root cause.

It seems pretty clear that the hangs plaguing KWin are amdgpu DC bugs.


-- 
Earthling Michel Dänzer       \        GNOME / Xwayland / Mesa developer
https://redhat.com             \               Libre software enthusiast

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ