linux-kernel - Re: [PATCH] drm/panthor: always set fence errors on CS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20250820104353.5cc8035d@fedora>
Date: Wed, 20 Aug 2025 10:43:53 +0200
From: Boris Brezillon <boris.brezillon@...labora.com>
To: Chia-I Wu <olvaffe@...il.com>
Cc: Steven Price <steven.price@....com>, Liviu Dudau <liviu.dudau@....com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard
 <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, David Airlie
 <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] drm/panthor: always set fence errors on CS_FAULT

On Tue, 8 Jul 2025 14:40:06 -0700
Chia-I Wu <olvaffe@...il.com> wrote:

> On Sun, Jun 22, 2025 at 11:32 PM Boris Brezillon
> <boris.brezillon@...labora.com> wrote:
> >
> > On Wed, 18 Jun 2025 07:55:49 -0700
> > Chia-I Wu <olvaffe@...il.com> wrote:
> >  
> > > It is unclear why fence errors were set only for CS_INHERIT_FAULT.
> > > Downstream driver also does not treat CS_INHERIT_FAULT specially.
> > > Remove the check.
> > >
> > > Signed-off-by: Chia-I Wu <olvaffe@...il.com>
> > > ---
> > >  drivers/gpu/drm/panthor/panthor_sched.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> > > index a2248f692a030..1a3b1c49f7d7b 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > > @@ -1399,7 +1399,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
> > >       fault = cs_iface->output->fault;
> > >       info = cs_iface->output->fault_info;
> > >
> > > -     if (queue && CS_EXCEPTION_TYPE(fault) == DRM_PANTHOR_EXCEPTION_CS_INHERIT_FAULT) {
> > > +     if (queue) {
> > >               u64 cs_extract = queue->iface.output->extract;
> > >               struct panthor_job *job;
> > >  
> >
> > Now that I look at the code, I think we should record the error when
> > the ERROR_BARRIER is executed instead of flagging all in-flight jobs as
> > faulty. One option would be to re-use the profiling buffer by adding an
> > error field to panthor_job_profiling_data, but we're going to lose 4
> > bytes per slot because of the 64-bit alignment we want for timestamps,
> > so maybe just create a separate buffers with N entries of:
> >
> > struct panthor_job_status {
> >    u32 error;
> > };  
> The current error path uses cs_extract to mark exactly the offending
> job faulty.  Innocent in-flight jobs do not seem to be affected.

My bad, I thought the faulty CS was automatically entering the recovery
substate (fetching all instructions and ignoring RUN_xxx ones), but it
turns out CS instruction fetching is stalled until the fault is
acknowledged, so we're good.

> 
> I looked into emitting LOAD/STORE after SYNC_ADD64 to copy the error
> to panthor_job_status.  Other than the extra instrs and storage,
> because group_sync_upd_work can be called before LOAD/STORE, it will
> need to check both panthor_job_status and panthor_syncobj_64b.  That
> will be a bit ugly as well.

Nah, I think you're right, I just had a wrong recollection of how
recovery mode works. The patch is

Reviewed-by: Boris Brezillon <boris.brezillon@...labora.com>