linux-kernel - Re: [PATCH v4] drm: Don't free jobs in wait_event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d13b3648-c7a5-1ef9-20c4-c327b15f1e4a@arm.com>
Date:   Thu, 26 Sep 2019 16:59:55 +0100
From:   Steven Price <steven.price@....com>
To:     "Grodzovsky, Andrey" <Andrey.Grodzovsky@....com>,
        Daniel Vetter <daniel@...ll.ch>,
        David Airlie <airlied@...ux.ie>,
        "Koenig, Christian" <Christian.Koenig@....com>
Cc:     "Deucher, Alexander" <Alexander.Deucher@....com>,
        Nayan Deshmukh <nayan26deshmukh@...il.com>,
        Sharat Masetty <smasetty@...eaurora.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH v4] drm: Don't free jobs in wait_event_interruptible()

On 26/09/2019 16:48, Grodzovsky, Andrey wrote:
> 
> On 9/26/19 11:23 AM, Steven Price wrote:
>> On 26/09/2019 16:14, Grodzovsky, Andrey wrote:
>>> On 9/26/19 10:16 AM, Steven Price wrote:
>>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
>>>> it is called as the condition of wait_event_interruptible() it must not
>>>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>>>>
>>>> Instead let's rename drm_sched_cleanup_jobs() to
>>>> drm_sched_get_cleanup_job() and simply return a job for processing if
>>>> there is one. The caller can then call the free_job() callback outside
>>>> the wait_event_interruptible() where sleeping is possible before
>>>> re-checking and returning to sleep if necessary.
>>>>
>>>> Signed-off-by: Steven Price <steven.price@....com>
>>>> ---
>>>> Changes from v3:
>>>>    * drm_sched_main() re-arms the timeout for the next job after calling
>>>>      free_job()
>>>>
>>>>    drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++-----------
>>>>    1 file changed, 26 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index 9a0ee74d82dc..148468447ba9 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>>>>    }
>>>>    
>>>>    /**
>>>> - * drm_sched_cleanup_jobs - destroy finished jobs
>>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
>>>>     *
>>>>     * @sched: scheduler instance
>>>>     *
>>>> - * Remove all finished jobs from the mirror list and destroy them.
>>>> + * Returns the next finished job from the mirror list (if there is one)
>>>> + * ready for it to be destroyed.
>>>>     */
>>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>> +static struct drm_sched_job *
>>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>>>    {
>>>> +	struct drm_sched_job *job = NULL;
>>>>    	unsigned long flags;
>>>>    
>>>>    	/* Don't destroy jobs while the timeout worker is running */
>>>>    	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>    	    !cancel_delayed_work(&sched->work_tdr))
>>>> -		return;
>>>> -
>>>> +		return NULL;
>>>>    
>>>> -	while (!list_empty(&sched->ring_mirror_list)) {
>>>> -		struct drm_sched_job *job;
>>>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>    
>>>> -		job = list_first_entry(&sched->ring_mirror_list,
>>>> +	job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>    				       struct drm_sched_job, node);
>>>> -		if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>> -			break;
>>>>    
>>>> -		spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> +	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>>    		/* remove job from ring_mirror_list */
>>>>    		list_del_init(&job->node);
>>>> -		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> -
>>>> -		sched->ops->free_job(job);
>>>> +	} else {
>>>> +		job = NULL;
>>>> +		/* queue timeout for next job */
>>>> +		drm_sched_start_timeout(sched);
>>>>    	}
>>>>    
>>>> -	/* queue timeout for next job */
>>>> -	spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> -	drm_sched_start_timeout(sched);
>>>>    	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>    
>>>> +	return job;
>>>>    }
>>>>    
>>>>    /**
>>>> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param)
>>>>    		struct drm_sched_fence *s_fence;
>>>>    		struct drm_sched_job *sched_job;
>>>>    		struct dma_fence *fence;
>>>> +		struct drm_sched_job *cleanup_job = NULL;
>>>>    
>>>>    		wait_event_interruptible(sched->wake_up_worker,
>>>> -					 (drm_sched_cleanup_jobs(sched),
>>>> +					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
>>>>    					 (!drm_sched_blocked(sched) &&
>>>>    					  (entity = drm_sched_select_entity(sched))) ||
>>>> -					 kthread_should_stop()));
>>>> +					 kthread_should_stop());
>>>> +
>>>> +		while (cleanup_job) {
>>>> +			sched->ops->free_job(cleanup_job);
>>>> +			/* queue timeout for next job */
>>>> +			drm_sched_start_timeout(sched);
>>>> +
>>>> +			cleanup_job = drm_sched_get_cleanup_job(sched);
>>>> +		}
>>>
>>> Why drm_sched_start_timeout is called both here and inside
>>> drm_sched_get_cleanup_job ? And also why call it multiple times in the
>>> loop instead of only once after the loop is done  ?
>> Christian pointed out to be that the first thing
>> drm_sched_get_cleanup_job does is call cancel_delayed_work(), and if
>> that returns false then it bails out with a NULL return. So to actually
>> get another job (if one exists) the timeout has to be restarted.
> 
> 
> For this case where timeout work already in progress note that 
> drm_sched_job_timedout restarts the timeout in it's end so it should be 
> ok to restart the timeout unconditionally inside 
> drm_sched_get_cleanup_job as it was done before.

I may have misinterpreted Christian[1], but I interpreted the below as
meaning that he'd prefer the caller (drm_sched_main()) to handle the
drm_sched_start_timeout() in this case:

On 26/09/2019 14:50, Koenig, Christian wrote:
>> Alternatively the caller could manually re-arm the timeout after
>> handling the job free.
> 
> I don't see anything that could go wrong immediately, but that is 
> probably the cleaner approach.

[1]
https://lore.kernel.org/lkml/d9b23ef8-f784-0bce-c34a-fa02002db1ea@amd.com/

I wasn't entirely sure if it was safe to leave the timeout running while
the job free was going on which would be the case if
drm_sched_get_cleanup_job() re-armed unconditionally.

>>
>> It's also necessary to restart the timeout in the case where the return
>> is NULL which is handled in the function itself.
>>
>> TBH I'm not sure whether this while loop is worth it - it may be better
>> to replace it with simply:
>>
>> 	if (cleanup_job) {
>> 		sched->ops->free_job(cleanup_job);
>> 		/* queue timeout for next job */
>> 		drm_sched_start_timeout(sched);
>> 	}
>>
>> The outer loop would then handle the next call to
>> drm_sched_get_cleanup_job() as necessary.
> 
> 
> What outer loop ?

There's a loop round the code in drm_sched_main():

	while (!kthread_should_stop()) {
		[...]
		wait_event_interruptible(...)

		while (cleanup_job) {...}

		if (!entity)
			continue;

		[...]
	}

So after handling the cleanup_job case, the outer loop will loop back
round to the wait_event_interruptible() call. One other concern is
whether it's possible for the thread to be signalled to stop before the
clean up has been completed. The while loop I added ensures that all
jobs are actually cleaned up before kthread_should_stop() is checked.

Steve