[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5jnpsmjef5ibegbsbelkfmudv4wagpcfb25nptqs5z4ccitq4c@3bdtrbrrmtil>
Date: Wed, 28 May 2025 14:07:34 -0500
From: Lucas De Marchi <lucas.demarchi@...el.com>
To: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>
CC: Min Ma <min.ma@....com>, Lizhi Hou <lizhi.hou@....com>, Oded Gabbay
<ogabbay@...nel.org>, Felix Kuehling <Felix.Kuehling@....com>, Alex Deucher
<alexander.deucher@....com>, Christian König
<christian.koenig@....com>, David Airlie <airlied@...il.com>, Simona Vetter
<simona@...ll.ch>, Lucas Stach <l.stach@...gutronix.de>, Russell King
<linux+etnaviv@...linux.org.uk>, Christian Gmeiner
<christian.gmeiner@...il.com>, Frank Binns <frank.binns@...tec.com>, "Matt
Coster" <matt.coster@...tec.com>, Maarten Lankhorst
<maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>, Qiang Yu <yuq825@...il.com>, "Rob
Clark" <robdclark@...il.com>, Abhinav Kumar <quic_abhinavk@...cinc.com>,
Dmitry Baryshkov <lumag@...nel.org>, Sean Paul <sean@...rly.run>, "Marijn
Suijten" <marijn.suijten@...ainline.org>, Lyude Paul <lyude@...hat.com>,
Danilo Krummrich <dakr@...nel.org>, Boris Brezillon
<boris.brezillon@...labora.com>, Rob Herring <robh@...nel.org>, Steven Price
<steven.price@....com>, Liviu Dudau <liviu.dudau@....com>, Matthew Brost
<matthew.brost@...el.com>, Philipp Stanner <phasta@...nel.org>, Melissa Wen
<mwen@...lia.com>, Maíra Canal <mcanal@...lia.com>, Thomas
Hellström <thomas.hellstrom@...ux.intel.com>, Rodrigo Vivi
<rodrigo.vivi@...el.com>, Christian König
<ckoenig.leichtzumerken@...il.com>, <dri-devel@...ts.freedesktop.org>,
<linux-kernel@...r.kernel.org>, <amd-gfx@...ts.freedesktop.org>,
<etnaviv@...ts.freedesktop.org>, <lima@...ts.freedesktop.org>,
<linux-arm-msm@...r.kernel.org>, <freedreno@...ts.freedesktop.org>,
<nouveau@...ts.freedesktop.org>, <intel-xe@...ts.freedesktop.org>
Subject: Re: [PATCH v11 02/10] drm/sched: Store the drm client_id in
drm_sched_fence
On Mon, May 26, 2025 at 02:54:44PM +0200, Pierre-Eric Pelloux-Prayer wrote:
> drivers/gpu/drm/xe/xe_sched_job.c | 3 ++-
>diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
>index f0a6ce610948..5921293b25db 100644
>--- a/drivers/gpu/drm/xe/xe_sched_job.c
>+++ b/drivers/gpu/drm/xe/xe_sched_job.c
>@@ -113,7 +113,8 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
> kref_init(&job->refcount);
> xe_exec_queue_get(job->q);
>
>- err = drm_sched_job_init(&job->drm, q->entity, 1, NULL);
>+ err = drm_sched_job_init(&job->drm, q->entity, 1, NULL,
>+ q->xef->drm->client_id);
you can't do this here. xef is only !NULL if it's a job from userspace.
For in-kernel jobs, xef is NULL and this explodes. Right now this
completely breaks xe since one of the very first things we do is
to submit a job to save the default context. Example:
https://intel-gfx-ci.01.org/tree/intel-xe/xe-3151-56d2b14961751a677ff1f7ff8b93a6c814ce2be3/bat-bmg-1/igt@xe_module_load@load.html
<4> [] RIP: 0010:xe_sched_job_create+0xbd/0x390 [xe]
<4> [] Code: c1 43 18 85 c0 0f 84 6f 02 00 00 8d 50 01 09 c2 0f 88 3e 02 00 00 48 8b 03 48 8b b3 d8 00 00 00 31 c9 4c 89 ef ba 01 00 00 00 <48> 8b 40 08 4c 8b 40 60 e8 86 64 7c ff 41 89 c4 85 c0 0f 85 9b 01
<4> [] RSP: 0018:ffffc900031972d8 EFLAGS: 00010246
<4> [] RAX: 0000000000000000 RBX: ffff88815fc40d00 RCX: 0000000000000000
<4> [] RDX: 0000000000000001 RSI: ffff88812e6552a8 RDI: ffff88815f939c40
<4> [] RBP: ffffc90003197318 R08: 0000000000000000 R09: 0000000000000000
<4> [] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90003197428
<4> [] R13: ffff88815f939c40 R14: ffff88811f054000 R15: ffff88815fc40d00
<4> [] FS: 00007681f2948940(0000) GS:ffff8888daf14000(0000) knlGS:0000000000000000
<4> [] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [] CR2: 0000000000000008 CR3: 0000000118315004 CR4: 0000000000f72ef0
<4> [] PKRU: 55555554
<4> [] Call Trace:
<4> [] <TASK>
<4> [] __xe_bb_create_job+0xa2/0x240 [xe]
<4> [] ? find_held_lock+0x31/0x90
<4> [] ? xa_find_after+0x12c/0x250
<4> [] xe_bb_create_job+0x6e/0x380 [xe]
<4> [] ? xa_find_after+0x136/0x250
<4> [] ? __drm_dev_dbg+0x7d/0xb0
<4> [] xe_gt_record_default_lrcs+0x542/0xb00 [xe]
Can we use 0 for in-kernel client since drm_file starts them from 1?
Like this:
| diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
| index 5921293b25db3..d21bf8f269640 100644
| --- a/drivers/gpu/drm/xe/xe_sched_job.c
| +++ b/drivers/gpu/drm/xe/xe_sched_job.c
| @@ -114,7 +114,7 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
| xe_exec_queue_get(job->q);
|
| err = drm_sched_job_init(&job->drm, q->entity, 1, NULL,
| - q->xef->drm->client_id);
| + q->xef ? q->xef->drm->client_id : 0);
| if (err)
| goto err_free;
I tested with the above diff and it at least loads...
Also, I see this in intel-xe mailing list, but I'm not sure why we
didn't have any CI results... I will check that.
Lucas De Marchi
Powered by blists - more mailing lists