[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <284c0e65-6204-7747-b294-1f9fa23b1f91@amd.com>
Date: Fri, 18 Mar 2022 16:14:47 -0400
From: Andrey Grodzovsky <andrey.grodzovsky@....com>
To: Rob Clark <robdclark@...il.com>
Cc: Christian König <christian.koenig@....com>,
dri-devel <dri-devel@...ts.freedesktop.org>,
freedreno <freedreno@...ts.freedesktop.org>,
linux-arm-msm <linux-arm-msm@...r.kernel.org>,
Rob Clark <robdclark@...omium.org>,
Sean Paul <sean@...rly.run>,
Abhinav Kumar <quic_abhinavk@...cinc.com>,
David Airlie <airlied@...ux.ie>,
Akhil P Oommen <quic_akhilpo@...cinc.com>,
Jonathan Marek <jonathan@...ek.ca>,
AngeloGioacchino Del Regno
<angelogioacchino.delregno@...labora.com>,
Bjorn Andersson <bjorn.andersson@...aro.org>,
Vladimir Lypak <vladimir.lypak@...il.com>,
open list <linux-kernel@...r.kernel.org>,
"Alexander.Deucher@....com" <Alexander.Deucher@....com>
Subject: Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system
suspend
On 2022-03-18 13:22, Rob Clark wrote:
> On Fri, Mar 18, 2022 at 9:27 AM Andrey Grodzovsky
> <andrey.grodzovsky@....com> wrote:
>>
>> On 2022-03-18 12:20, Rob Clark wrote:
>>> On Fri, Mar 18, 2022 at 9:04 AM Andrey Grodzovsky
>>> <andrey.grodzovsky@....com> wrote:
>>>> On 2022-03-17 16:35, Rob Clark wrote:
>>>>> On Thu, Mar 17, 2022 at 12:50 PM Andrey Grodzovsky
>>>>> <andrey.grodzovsky@....com> wrote:
>>>>>> On 2022-03-17 14:25, Rob Clark wrote:
>>>>>>> On Thu, Mar 17, 2022 at 11:10 AM Andrey Grodzovsky
>>>>>>> <andrey.grodzovsky@....com> wrote:
>>>>>>>> On 2022-03-17 13:35, Rob Clark wrote:
>>>>>>>>> On Thu, Mar 17, 2022 at 9:45 AM Christian König
>>>>>>>>> <christian.koenig@....com> wrote:
>>>>>>>>>> Am 17.03.22 um 17:18 schrieb Rob Clark:
>>>>>>>>>>> On Thu, Mar 17, 2022 at 9:04 AM Christian König
>>>>>>>>>>> <christian.koenig@....com> wrote:
>>>>>>>>>>>> Am 17.03.22 um 16:10 schrieb Rob Clark:
>>>>>>>>>>>>> [SNIP]
>>>>>>>>>>>>> userspace frozen != kthread frozen .. that is what this patch is
>>>>>>>>>>>>> trying to address, so we aren't racing between shutting down the hw
>>>>>>>>>>>>> and the scheduler shoveling more jobs at us.
>>>>>>>>>>>> Well exactly that's the problem. The scheduler is supposed to shoveling
>>>>>>>>>>>> more jobs at us until it is empty.
>>>>>>>>>>>>
>>>>>>>>>>>> Thinking more about it we will then keep some dma_fence instance
>>>>>>>>>>>> unsignaled and that is and extremely bad idea since it can lead to
>>>>>>>>>>>> deadlocks during suspend.
>>>>>>>>>>> Hmm, perhaps that is true if you need to migrate things out of vram?
>>>>>>>>>>> It is at least not a problem when vram is not involved.
>>>>>>>>>> No, it's much wider than that.
>>>>>>>>>>
>>>>>>>>>> See what can happen is that the memory management shrinkers want to wait
>>>>>>>>>> for a dma_fence during suspend.
>>>>>>>>> we don't wait on fences in shrinker, only purging or evicting things
>>>>>>>>> that are already ready. Actually, waiting on fences in shrinker path
>>>>>>>>> sounds like a pretty bad idea.
>>>>>>>>>
>>>>>>>>>> And if you stop the scheduler they will just wait forever.
>>>>>>>>>>
>>>>>>>>>> What you need to do instead is to drain the scheduler, e.g. call
>>>>>>>>>> drm_sched_entity_flush() with a proper timeout for each entity you have
>>>>>>>>>> created.
>>>>>>>>> yeah, it would work to drain the scheduler.. I guess that might be the
>>>>>>>>> more portable approach as far as generic solution for suspend.
>>>>>>>>>
>>>>>>>>> BR,
>>>>>>>>> -R
>>>>>>>> I am not sure how this drains the scheduler ? Suppose we done the
>>>>>>>> waiting in drm_sched_entity_flush,
>>>>>>>> what prevents someone to push right away another job into the same
>>>>>>>> entity's queue right after that ?
>>>>>>>> Shouldn't we first disable further pushing of jobs into entity before we
>>>>>>>> wait for sched->job_scheduled ?
>>>>>>>>
>>>>>>> In the system suspend path, userspace processes will have already been
>>>>>>> frozen, so there should be no way to push more jobs to the scheduler,
>>>>>>> unless they are pushed from the kernel itself.
>>>>>>> amdgpu_device_suspend
>>>>>> It was my suspicion but I wasn't sure about it.
>>>>>>
>>>>>>
>>>>>>> We don't do that in
>>>>>>> drm/msm, but maybe you need to to move things btwn vram and system
>>>>>>> memory?
>>>>>> Exactly, that was my main concern - if we use this method we have to use
>>>>>> it in a point in
>>>>>> suspend sequence when all the in kernel job submissions activity already
>>>>>> suspended
>>>>>>
>>>>>>> But even in that case, if the # of jobs you push is bounded I
>>>>>>> guess that is ok?
>>>>>> Submissions to scheduler entities are using unbounded queue, the bounded
>>>>>> part is when
>>>>>> you extract next job from entity to submit to HW ring and it rejects if
>>>>>> submission limit reached (drm_sched_ready)
>>>>>>
>>>>>> In general - It looks to me at least that what we what we want her is
>>>>>> more of a drain operation then flush (i.e.
>>>>>> we first want to disable any further job submission to entity's queue
>>>>>> and then flush all in flight ones). As example
>>>>>> for this i was looking at flush_workqueue vs. drain_workqueue
>>>>> Would it be possible for amdgpu to, in the system suspend task,
>>>>>
>>>>> 1) first queue up all the jobs needed to migrate bos out of vram, and
>>>>> whatever other housekeeping jobs are needed
>>>>> 2) then drain gpu scheduler's queues
>>>>> 3) and then finally wait for jobs executing on GPU to complete
>>>> We already do most of it in amdgpu_device_suspend,
>>>> amdgpu_device_ip_suspend_phase1
>>>> followed by amdgpu_device_evict_resources followed by
>>>> amdgpu_fence_driver_hw_fini is
>>>> exactly steps 1 + 3. What we are missing is step 2). For this step I
>>>> suggest adding a function
>>>> called drm_sched_entity_drain which basically sets entity->stopped =
>>>> true and then calls drm_sched_entity_flush.
>>>> This will both reject any new insertions into entity's job queue and
>>>> will flush all pending job submissions to HW from that entity.
>>>> One point is we need to make make drm_sched_entity_push_job return value
>>>> so the caller knows about job enqueue
>>>> rejection.
>>> Hmm, seems like job enqueue that is rejected because we are in the
>>> process of suspending should be more of a WARN_ON() sort of thing?
>>> Not sure if there is something sensible to do for the caller at that
>>> point?
>>
>> What about the job's fence the caller is waiting on ? If we rejected
>> job submission the caller must know about it to not get stuck waiting
>> on that fence.
>>
> Hmm, perhaps I'm not being imaginative enough, but this sort of
> scenario seems like it should only arise from a bug in the driver's
> suspend path, Ie. not doing all the job submission before shutting
> down the scheduler. I don't think anything good is going to result
> either way, which is why I was thinking you'd want a WARN_ON() to help
> debug/fix that case.
Yes, I just wanted the code to not allow such bugs to go through
unnoticed. I guess
WARN_ON should give laud enough warning anyway
Andrey
>>>> What about runtime suspend ? I guess same issue with scheduler racing
>>>> against HW susppend is relevant there ?
>>> Runtime suspend should be ok, as long as the driver holds a runpm
>>> reference whenever the hw needs to be awake. The problem with system
>>> suspend (at least if you are using pm_runtime_force_suspend() or doing
>>> something equivalent) is that it bypasses the runpm reference.
>>> (Which, IMO, seems like a bad design..)
>>
>> I am not totally clear yet - can you expand a bit why one case is ok
>> but the other
>> problematic ?
>>
> Sure, normally pm_runtime_get/put increment a reference count, as long
> as there have been more get's than puts, the device won't runtime
> suspend. So, for ex, msm's run_job fxn does a pm_runtime_get_sync().
> And retire_submit() which runs after job completes on GPU does a
> pm_runtime_put_autosuspend().
>
> System suspend, OTOH, bypasses this reference counting. Which is why
> extra care is needed.
>
> BR,
> -R
>
>
>> Andrey
>>
>>
>>>> Also, could you point to a particular buggy scenario where the race
>>>> between SW shceduler and suspend is causing a problem ?
>>> I wrote a piglit test[1] to try to trigger this scenario.. it isn't
>>> really that easy to hit
>>>
>>> BR,
>>> -R
>>>
>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fpiglit%2F-%2Fmerge_requests%2F643&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C35f0d7d9282044651c9708da0903d4f4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637832209324217553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=dwjPEVAYgCI%2BtEyzBirfAQkJjZax2NdiLQfNeFfImtU%3D&reserved=0
>>>
>>>> Andrey
>>>>
>>>>
>>>>> BR,
>>>>> -R
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> BR,
>>>>>>> -R
Powered by blists - more mailing lists