[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250901095453.19a1aead@fedora>
Date: Mon, 1 Sep 2025 09:54:53 +0200
From: Boris Brezillon <boris.brezillon@...labora.com>
To: Daniel Stone <daniel@...ishbar.org>
Cc: Adrián Larumbe <adrian.larumbe@...labora.com>,
linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
kernel@...labora.com, Rob Herring <robh@...nel.org>, Steven Price
<steven.price@....com>, Maarten Lankhorst
<maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>
Subject: Re: [PATCH 3/5] drm/panfrost: Introduce JM context for manging job
resources
On Sat, 30 Aug 2025 10:12:32 +0200
Daniel Stone <daniel@...ishbar.org> wrote:
> Hi Adrian,
>
> On Thu, 28 Aug 2025 at 04:35, Adrián Larumbe
> <adrian.larumbe@...labora.com> wrote:
> > -void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
> > +int panfrost_jm_ctx_destroy(struct drm_file *file, u32 handle)
> > {
> > - struct panfrost_device *pfdev = panfrost_priv->pfdev;
> > - int i;
> > + struct panfrost_file_priv *priv = file->driver_priv;
> > + struct panfrost_device *pfdev = priv->pfdev;
> > + struct panfrost_jm_ctx *jm_ctx;
> >
> > - for (i = 0; i < NUM_JOB_SLOTS; i++)
> > - drm_sched_entity_destroy(&panfrost_priv->sched_entity[i]);
> > + jm_ctx = xa_erase(&priv->jm_ctxs, handle);
> > + if (!jm_ctx)
> > + return -EINVAL;
> > +
> > + for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
> > + if (jm_ctx->slots[i].enabled)
> > + drm_sched_entity_destroy(&jm_ctx->slots[i].sched_entity);
> > + }
> >
> > /* Kill in-flight jobs */
> > spin_lock(&pfdev->js->job_lock);
> > - for (i = 0; i < NUM_JOB_SLOTS; i++) {
> > - struct drm_sched_entity *entity = &panfrost_priv->sched_entity[i];
> > - int j;
> > + for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
> > + struct drm_sched_entity *entity = &jm_ctx->slots[i].sched_entity;
> > +
> > + if (!jm_ctx->slots[i].enabled)
> > + continue;
> >
> > - for (j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) {
> > + for (int j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) {
> > struct panfrost_job *job = pfdev->jobs[i][j];
> > u32 cmd;
> >
> > @@ -980,18 +1161,7 @@ void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
> > }
> > }
> > spin_unlock(&pfdev->js->job_lock);
> > -}
> > -
> > -int panfrost_job_is_idle(struct panfrost_device *pfdev)
> > -{
> > - struct panfrost_job_slot *js = pfdev->js;
> > - int i;
> > -
> > - for (i = 0; i < NUM_JOB_SLOTS; i++) {
> > - /* If there are any jobs in the HW queue, we're not idle */
> > - if (atomic_read(&js->queue[i].sched.credit_count))
> > - return false;
> > - }
> >
> > - return true;
> > + panfrost_jm_ctx_put(jm_ctx);
> > + return 0;
> > }
>
> It seems odd that both panfrost_jm_ctx_destroy() and
> panfrost_jm_ctx_release() share lifetime responsibilities. I'd expect
> calling panfrost_jm_ctx_destroy() to just release the xarray handle
> and drop the refcount.
I guess you refer to the drm_sched_entity_destroy() calls. If so, I
agree that they should be removed from panfrost_jm_ctx_release() because
panfrost_jm_ctx_destroy() should always be called for the JM ctx
refcount to drop to zero.
>
> I can see why calling panfrost_jm_ctx_destroy() is the one to go try
> to cancel the jobs - because the jobs keep a refcount on the context,
> so we need to break that cycle somehow. But having both the
> handle-release and object-release function drop a ref on the sched
> entity seems odd?
Note that drm_sched_entity_destroy() doesn't really drop a ref, it just
flushes/cancels the jobs, and makes sure the entity is no longer
considered by the scheduler. After the first drm_sched_entity_destroy()
happens (in jm_ctx_destroy()), I'd expect entity->rq to be NULL, making
the subsequent call to drm_sched_entity_destroy() (in jm_ctx_release())
a NOP (both drm_sched_entity_{flush,fini}() bail out early if
entity->rq is NULL). Now, there might be other things in
drm_sched_entity that are not safe to cleanup twice, and I agree that
drm_sched_entity_destroy() shouldn't be called in both places anyway.
>
> It doesn't help much that panfrost_job is used both for actual jobs
> (as the type) and the capability for a device to have multiple
> job-manager contexts (as a function prefix). Would be great to clean
> that up, so you don't have to think about whether e.g.
> panfrost_job_close() is actually operating on a panfrost_job, or
> operating on multiple panfrost_jm_ctx which operate on multiple
> panfrost_job.
Yep, we should definitely change the prefix to panthor_jm_ when the
function manipulates the JM scheduler context.
Powered by blists - more mailing lists