linux-kernel - Re: [PATCH v5 1/2] drm/panthor: Reset queue slots if termination fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1973a637a7f.b61987aa482053.3031227813632792112@collabora.com>
Date: Wed, 04 Jun 2025 11:01:27 +0100
From: Ashley Smith <ashley.smith@...labora.com>
To: "Liviu Dudau" <liviu.dudau@....com>
Cc: "Boris Brezillon" <boris.brezillon@...labora.com>,
	"Steven Price" <steven.price@....com>,
	"Maarten Lankhorst" <maarten.lankhorst@...ux.intel.com>,
	"Maxime Ripard" <mripard@...nel.org>,
	"Thomas Zimmermann" <tzimmermann@...e.de>,
	"David Airlie" <airlied@...il.com>,
	"Simona Vetter" <simona@...ll.ch>, "kernel" <kernel@...labora.com>,
	"open list:ARM MALI PANTHOR DRM DRIVER" <dri-devel@...ts.freedesktop.org>,
	"open list" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 1/2] drm/panthor: Reset queue slots if termination
 fails

On Tue, 03 Jun 2025 12:09:44 +0100 Liviu Dudau <liviu.dudau@....com> wrote:
 > On Tue, Jun 03, 2025 at 10:49:31AM +0100, Ashley Smith wrote: 
 > > This fixes a bug where if we timeout after a suspend and the termination 
 > > fails, due to waiting on a fence that will never be signalled for 
 > > example, we do not resume the group correctly. The fix forces a reset 
 > > for groups that are not terminated correctly. 
 >  
 > I have a question on the commit message: you're describing a situation where 
 > a fence will *never* be signalled. Is that a real example? I thought this is 
 > not supposed to ever happen! Or are you trying to say that the fence signalling 
 > happens after the timeout?

This covers cases where a fence is never signalled. It shouldn't happen, but we have found this in some situations with a FW hang. Since queue_suspend_timeout() is only called on state update, if a suspend/terminate fails due to a FW hang for example this will leave delayed work, possibly leading to an incorrect queue_timeout_work(). Maybe I should not have used the word bug, it's more choosing a failsafe path.