linux-kernel - Re: [PATCH 3/5] drm/panfrost: Introduce JM context for manging job resources

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250901095453.19a1aead@fedora>
Date: Mon, 1 Sep 2025 09:54:53 +0200
From: Boris Brezillon <boris.brezillon@...labora.com>
To: Daniel Stone <daniel@...ishbar.org>
Cc: Adrián Larumbe <adrian.larumbe@...labora.com>,
 linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
 kernel@...labora.com, Rob Herring <robh@...nel.org>, Steven Price
 <steven.price@....com>, Maarten Lankhorst
 <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
 Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>,
 Simona Vetter <simona@...ll.ch>
Subject: Re: [PATCH 3/5] drm/panfrost: Introduce JM context for manging job
 resources

On Sat, 30 Aug 2025 10:12:32 +0200
Daniel Stone <daniel@...ishbar.org> wrote:

> Hi Adrian,
> 
> On Thu, 28 Aug 2025 at 04:35, Adrián Larumbe
> <adrian.larumbe@...labora.com> wrote:
> > -void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
> > +int panfrost_jm_ctx_destroy(struct drm_file *file, u32 handle)
> >  {
> > -       struct panfrost_device *pfdev = panfrost_priv->pfdev;
> > -       int i;
> > +       struct panfrost_file_priv *priv = file->driver_priv;
> > +       struct panfrost_device *pfdev = priv->pfdev;
> > +       struct panfrost_jm_ctx *jm_ctx;
> >
> > -       for (i = 0; i < NUM_JOB_SLOTS; i++)
> > -               drm_sched_entity_destroy(&panfrost_priv->sched_entity[i]);
> > +       jm_ctx = xa_erase(&priv->jm_ctxs, handle);
> > +       if (!jm_ctx)
> > +               return -EINVAL;
> > +
> > +       for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
> > +               if (jm_ctx->slots[i].enabled)
> > +                       drm_sched_entity_destroy(&jm_ctx->slots[i].sched_entity);
> > +       }
> >
> >         /* Kill in-flight jobs */
> >         spin_lock(&pfdev->js->job_lock);
> > -       for (i = 0; i < NUM_JOB_SLOTS; i++) {
> > -               struct drm_sched_entity *entity = &panfrost_priv->sched_entity[i];
> > -               int j;
> > +       for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
> > +               struct drm_sched_entity *entity = &jm_ctx->slots[i].sched_entity;
> > +
> > +               if (!jm_ctx->slots[i].enabled)
> > +                       continue;
> >
> > -               for (j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) {
> > +               for (int j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) {
> >                         struct panfrost_job *job = pfdev->jobs[i][j];
> >                         u32 cmd;
> >
> > @@ -980,18 +1161,7 @@ void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
> >                 }
> >         }
> >         spin_unlock(&pfdev->js->job_lock);
> > -}
> > -
> > -int panfrost_job_is_idle(struct panfrost_device *pfdev)
> > -{
> > -       struct panfrost_job_slot *js = pfdev->js;
> > -       int i;
> > -
> > -       for (i = 0; i < NUM_JOB_SLOTS; i++) {
> > -               /* If there are any jobs in the HW queue, we're not idle */
> > -               if (atomic_read(&js->queue[i].sched.credit_count))
> > -                       return false;
> > -       }
> >
> > -       return true;
> > +       panfrost_jm_ctx_put(jm_ctx);
> > +       return 0;
> >  }  
> 
> It seems odd that both panfrost_jm_ctx_destroy() and
> panfrost_jm_ctx_release() share lifetime responsibilities. I'd expect
> calling panfrost_jm_ctx_destroy() to just release the xarray handle
> and drop the refcount.

I guess you refer to the drm_sched_entity_destroy() calls. If so, I
agree that they should be removed from panfrost_jm_ctx_release() because
panfrost_jm_ctx_destroy() should always be called for the JM ctx
refcount to drop to zero.

> 
> I can see why calling panfrost_jm_ctx_destroy() is the one to go try
> to cancel the jobs - because the jobs keep a refcount on the context,
> so we need to break that cycle somehow. But having both the
> handle-release and object-release function drop a ref on the sched
> entity seems odd?

Note that drm_sched_entity_destroy() doesn't really drop a ref, it just
flushes/cancels the jobs, and makes sure the entity is no longer
considered by the scheduler. After the first drm_sched_entity_destroy()
happens (in jm_ctx_destroy()), I'd expect entity->rq to be NULL, making
the subsequent call to drm_sched_entity_destroy() (in jm_ctx_release())
a NOP (both drm_sched_entity_{flush,fini}() bail out early if
entity->rq is NULL). Now, there might be other things in
drm_sched_entity that are not safe to cleanup twice, and I agree that
drm_sched_entity_destroy() shouldn't be called in both places anyway.

> 
> It doesn't help much that panfrost_job is used both for actual jobs
> (as the type) and the capability for a device to have multiple
> job-manager contexts (as a function prefix). Would be great to clean
> that up, so you don't have to think about whether e.g.
> panfrost_job_close() is actually operating on a panfrost_job, or
> operating on multiple panfrost_jm_ctx which operate on multiple
> panfrost_job.

Yep, we should definitely change the prefix to panthor_jm_ when the
function manipulates the JM scheduler context.