[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250910185058.5239ada4@fedora>
Date: Wed, 10 Sep 2025 18:50:58 +0200
From: Boris Brezillon <boris.brezillon@...labora.com>
To: Steven Price <steven.price@....com>
Cc: Adrián Larumbe <adrian.larumbe@...labora.com>,
linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
kernel@...labora.com, Rob Herring <robh@...nel.org>, Maarten Lankhorst
<maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>
Subject: Re: [PATCH v2 2/4] drm/panfrost: Introduce JM contexts for manging
job resources
On Wed, 10 Sep 2025 16:56:43 +0100
Steven Price <steven.price@....com> wrote:
> On 10/09/2025 16:52, Boris Brezillon wrote:
> > On Wed, 10 Sep 2025 16:42:32 +0100
> > Steven Price <steven.price@....com> wrote:
> >
> >>> +int panfrost_jm_ctx_create(struct drm_file *file,
> >>> + struct drm_panfrost_jm_ctx_create *args)
> >>> +{
> >>> + struct panfrost_file_priv *priv = file->driver_priv;
> >>> + struct panfrost_device *pfdev = priv->pfdev;
> >>> + enum drm_sched_priority sched_prio;
> >>> + struct panfrost_jm_ctx *jm_ctx;
> >>> +
> >>> + int ret;
> >>> +
> >>> + jm_ctx = kzalloc(sizeof(*jm_ctx), GFP_KERNEL);
> >>> + if (!jm_ctx)
> >>> + return -ENOMEM;
> >>> +
> >>> + kref_init(&jm_ctx->refcnt);
> >>> +
> >>> + /* Same priority for all JS within a single context */
> >>> + jm_ctx->config = JS_CONFIG_THREAD_PRI(args->priority);
> >>> +
> >>> + ret = jm_ctx_prio_to_drm_sched_prio(file, args->priority, &sched_prio);
> >>> + if (ret)
> >>> + goto err_put_jm_ctx;
> >>> +
> >>> + for (u32 i = 0; i < NUM_JOB_SLOTS - 1; i++) {
> >>> + struct drm_gpu_scheduler *sched = &pfdev->js->queue[i].sched;
> >>> + struct panfrost_js_ctx *js_ctx = &jm_ctx->slots[i];
> >>> +
> >>> + ret = drm_sched_entity_init(&js_ctx->sched_entity, sched_prio,
> >>> + &sched, 1, NULL);
> >>> + if (ret)
> >>> + goto err_put_jm_ctx;
> >>> +
> >>> + js_ctx->enabled = true;
> >>> + }
> >>> +
> >>> + ret = xa_alloc(&priv->jm_ctxs, &args->handle, jm_ctx,
> >>> + XA_LIMIT(0, MAX_JM_CTX_PER_FILE), GFP_KERNEL);
> >>> + if (ret)
> >>> + goto err_put_jm_ctx;
> >>
> >> On error here we just jump down and call panfrost_jm_ctx_put() which
> >> will free jm_ctx but won't destroy any of the drm_sched_entities. There
> >> seems to be something a bit off with the lifetime management here.
> >>
> >> Should panfrost_jm_ctx_release() be responsible for tearing down the
> >> context, and panfrost_jm_ctx_destroy() be nothing more than dropping the
> >> reference?
> >
> > The idea was to kill/cancel any pending jobs as soon as userspace
> > releases the context, like we were doing previously when the FD was
> > closed. If we defer this ctx teardown to the release() function, we're
> > basically waiting for all jobs to complete, which:
> >
> > 1. doesn't encourage userspace to have proper control over the contexts
> > lifetime
> > 2. might use GPU/mem resources to execute jobs no one cares about
> > anymore
>
> Ah, good point - yes killing the jobs in panfrost_jm_ctx_destroy() makes
> sense. But we still need to ensure the clean-up happens in the other
> paths ;)
>
> So panfrost_jm_ctx_destroy() should keep the killing jobs part, butthe
> drm scheduler entity cleanup should be moved.
The thing is, we need to call drm_sched_entity_fini() if we want all
pending jobs that were not queued to the HW yet to be cancelled
(_fini() calls _flush() + _kill()). This has to happen before we cancel
the jobs at the JM level, otherwise drm_sched might pass us new jobs
while we're trying to get rid of the currently running ones. Once we've
done that, there's basically nothing left to defer, except the kfree().
Powered by blists - more mailing lists