linux-kernel - Re: [PATCH v2 2/4] drm/panfrost: Introduce JM contexts for manging job resources

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250910185058.5239ada4@fedora>
Date: Wed, 10 Sep 2025 18:50:58 +0200
From: Boris Brezillon <boris.brezillon@...labora.com>
To: Steven Price <steven.price@....com>
Cc: Adrián Larumbe <adrian.larumbe@...labora.com>,
 linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
 kernel@...labora.com, Rob Herring <robh@...nel.org>, Maarten Lankhorst
 <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
 Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>,
 Simona Vetter <simona@...ll.ch>
Subject: Re: [PATCH v2 2/4] drm/panfrost: Introduce JM contexts for manging
 job resources

On Wed, 10 Sep 2025 16:56:43 +0100
Steven Price <steven.price@....com> wrote:

> On 10/09/2025 16:52, Boris Brezillon wrote:
> > On Wed, 10 Sep 2025 16:42:32 +0100
> > Steven Price <steven.price@....com> wrote:
> >   
> >>> +int panfrost_jm_ctx_create(struct drm_file *file,
> >>> +			   struct drm_panfrost_jm_ctx_create *args)
> >>> +{
> >>> +	struct panfrost_file_priv *priv = file->driver_priv;
> >>> +	struct panfrost_device *pfdev = priv->pfdev;
> >>> +	enum drm_sched_priority sched_prio;
> >>> +	struct panfrost_jm_ctx *jm_ctx;
> >>> +
> >>> +	int ret;
> >>> +
> >>> +	jm_ctx = kzalloc(sizeof(*jm_ctx), GFP_KERNEL);
> >>> +	if (!jm_ctx)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	kref_init(&jm_ctx->refcnt);
> >>> +
> >>> +	/* Same priority for all JS within a single context */
> >>> +	jm_ctx->config = JS_CONFIG_THREAD_PRI(args->priority);
> >>> +
> >>> +	ret = jm_ctx_prio_to_drm_sched_prio(file, args->priority, &sched_prio);
> >>> +	if (ret)
> >>> +		goto err_put_jm_ctx;
> >>> +
> >>> +	for (u32 i = 0; i < NUM_JOB_SLOTS - 1; i++) {
> >>> +		struct drm_gpu_scheduler *sched = &pfdev->js->queue[i].sched;
> >>> +		struct panfrost_js_ctx *js_ctx = &jm_ctx->slots[i];
> >>> +
> >>> +		ret = drm_sched_entity_init(&js_ctx->sched_entity, sched_prio,
> >>> +					    &sched, 1, NULL);
> >>> +		if (ret)
> >>> +			goto err_put_jm_ctx;
> >>> +
> >>> +		js_ctx->enabled = true;
> >>> +	}
> >>> +
> >>> +	ret = xa_alloc(&priv->jm_ctxs, &args->handle, jm_ctx,
> >>> +		       XA_LIMIT(0, MAX_JM_CTX_PER_FILE), GFP_KERNEL);
> >>> +	if (ret)
> >>> +		goto err_put_jm_ctx;    
> >>
> >> On error here we just jump down and call panfrost_jm_ctx_put() which
> >> will free jm_ctx but won't destroy any of the drm_sched_entities. There
> >> seems to be something a bit off with the lifetime management here.
> >>
> >> Should panfrost_jm_ctx_release() be responsible for tearing down the
> >> context, and panfrost_jm_ctx_destroy() be nothing more than dropping the
> >> reference?  
> > 
> > The idea was to kill/cancel any pending jobs as soon as userspace
> > releases the context, like we were doing previously when the FD was
> > closed. If we defer this ctx teardown to the release() function, we're
> > basically waiting for all jobs to complete, which:
> > 
> > 1. doesn't encourage userspace to have proper control over the contexts
> >    lifetime
> > 2. might use GPU/mem resources to execute jobs no one cares about
> >    anymore  
> 
> Ah, good point - yes killing the jobs in panfrost_jm_ctx_destroy() makes
> sense. But we still need to ensure the clean-up happens in the other
> paths ;)
> 
> So panfrost_jm_ctx_destroy() should keep the killing jobs part, butthe
> drm scheduler entity cleanup should be moved.

The thing is, we need to call drm_sched_entity_fini() if we want all
pending jobs that were not queued to the HW yet to be cancelled
(_fini() calls _flush() + _kill()). This has to happen before we cancel
the jobs at the JM level, otherwise drm_sched might pass us new jobs
while we're trying to get rid of the currently running ones. Once we've
done that, there's basically nothing left to defer, except the kfree().