[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADGDV=U_7CdkdEiLX9kj9yHsXhwb5zP_eGXpwmrj20cmgzMAtA@mail.gmail.com>
Date: Tue, 7 Jan 2025 16:21:57 +0100
From: Philipp Reisner <philipp.reisner@...bit.com>
To: Christian König <christian.koenig@....com>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
Simona Vetter <simona@...ll.ch>
Subject: Re: [PATCH] drm/sched: Fix amdgpu crash upon suspend/resume
[...]
> > The OOPS happens because the rq member of entity is NULL in
> > drm_sched_job_arm() after the call to drm_sched_entity_select_rq().
> >
> > In drm_sched_entity_select_rq(), the code considers that
> > drb_sched_pick_best() might return a NULL value. When NULL, it assigns
> > NULL to entity->rq even if it had a non-NULL value before.
> >
> > drm_sched_job_arm() does not deal with entities having a rq of NULL.
> >
> > Fix this by leaving the entity on the engine it was instead of
> > assigning a NULL to its run queue member.
>
> Well that is clearly not the correct approach to fixing this. So clearly
> a NAK from my side.
>
> The real question is why is amdgpu_cs_ioctl() called when all of
> userspace should be frozen?
>
> Regards,
> Christian.
>
Could the OOPS happen at resume time? Might it be that the kernel
activates user-space
before all the components of the GPU finished their wakeup?
Maybe drm_sched_pick_best() returns NULL since no scheduler is ready yet?
Apart from whether amdgpu_cs_ioctl() should run at this point, I still think the
suggested change improves the code. drm_sched_pick_best() can return NULL.
drm_sched_entity_select_rq() can handle the NULL (partially).
drm_sched_job_arm() crashes on an entity that has rq set to NULL.
The handling of NULL values is half-baked.
In my opinion, you should define if drm_sched_pick_best() may put a NULL into
rq. If your answer is yes, it might put a NULL there; then, there should be a
BUG_ON(!entity->rq) after the invocation of drm_sched_entity_select_rq().
If your answer is no, the BUG_ON() should be in drm_sched_pick_best().
That helps guys with zero domain knowledge, like me, to figure out how
this is all
supposed to work.
best regards,
Philipp
Powered by blists - more mailing lists