linux-kernel - Re: [PATCH] drm/sched/tests: Use one lock for fence context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1edd0499-ce5d-45a0-a989-ecb86f726795@igalia.com>
Date: Fri, 23 May 2025 15:58:33 +0100
From: Tvrtko Ursulin <tvrtko.ursulin@...lia.com>
To: phasta@...nel.org, Matthew Brost <matthew.brost@...el.com>,
 Danilo Krummrich <dakr@...nel.org>,
 Christian König <ckoenig.leichtzumerken@...il.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Sumit Semwal <sumit.semwal@...aro.org>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
 linux-media@...r.kernel.org
Subject: Re: [PATCH] drm/sched/tests: Use one lock for fence context


On 22/05/2025 15:06, Philipp Stanner wrote:
> On Wed, 2025-05-21 at 11:24 +0100, Tvrtko Ursulin wrote:
>>
>> On 21/05/2025 11:04, Philipp Stanner wrote:
>>> When the unit tests were implemented, each scheduler job got its
>>> own,
>>> distinct lock. This is not how dma_fence context locking rules are
>>> to be
>>> implemented. All jobs belonging to the same fence context (in this
>>> case:
>>> scheduler) should share a lock for their dma_fences. This is to
>>> comply
>>> to various dma_fence rules, e.g., ensuring that only one fence gets
>>> signaled at a time.
>>>
>>> Use the fence context (scheduler) lock for the jobs.
>>
>> I think for the mock scheduler it works to share the lock, but I
>> don't
>> think see that the commit message is correct. Where do you see the
>> requirement to share the lock? AFAIK fence->lock is a fence lock,
>> nothing more semantically.
> 
> This patch is in part to probe a bit with Christian and Danilo to see
> whether we can get a bit more clarity about it.
> 
> In many places, notably Nouveau, it's definitely well established
> practice to use one lock for the fctx and all the jobs associated with
> it.
> 
> 
>>
>> And what does "ensuring that only one fence gets signalled at a time"
>> mean? You mean signal in seqno order?
> 
> Yes. But that's related. If jobs' fences can get signaled indepently
> from each other, that might race and screw up ordering. A common lock
> can prevent that.
> 
>> Even that is not guaranteed in the
>> contract due opportunistic signalling.
> 
> Jobs must be submitted to the hardware in the order they were
> submitted, and, therefore, their fences must be signaled in order. No?
> 
> What do you mean by opportunistic signaling?

Our beloved dma_fence_is_signaled(). External caller can signal a fence 
before the driver which owns it does.

If you change the commit message to correctly describe it is just a 
simplification since there is no need for separate locks I am good with 
that. It is a good simplification in that case.

Regards,

Tvrtko

> 
> 
> P.
> 
> 
> 
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Signed-off-by: Philipp Stanner <phasta@...nel.org>
>>> ---
>>>    drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 5 ++---
>>>    drivers/gpu/drm/scheduler/tests/sched_tests.h    | 1 -
>>>    2 files changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>>> b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>>> index f999c8859cf7..17023276f4b0 100644
>>> --- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>>> +++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>>> @@ -64,7 +64,7 @@ static void drm_mock_sched_job_complete(struct
>>> drm_mock_sched_job *job)
>>>    
>>>    	job->flags |= DRM_MOCK_SCHED_JOB_DONE;
>>>    	list_move_tail(&job->link, &sched->done_list);
>>> -	dma_fence_signal(&job->hw_fence);
>>> +	dma_fence_signal_locked(&job->hw_fence);
>>>    	complete(&job->done);
>>>    }
>>>    
>>> @@ -123,7 +123,6 @@ drm_mock_sched_job_new(struct kunit *test,
>>>    	job->test = test;
>>>    
>>>    	init_completion(&job->done);
>>> -	spin_lock_init(&job->lock);
>>>    	INIT_LIST_HEAD(&job->link);
>>>    	hrtimer_setup(&job->timer,
>>> drm_mock_sched_job_signal_timer,
>>>    		      CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
>>> @@ -169,7 +168,7 @@ static struct dma_fence
>>> *mock_sched_run_job(struct drm_sched_job *sched_job)
>>>    
>>>    	dma_fence_init(&job->hw_fence,
>>>    		       &drm_mock_sched_hw_fence_ops,
>>> -		       &job->lock,
>>> +		       &sched->lock,
>>>    		       sched->hw_timeline.context,
>>>    		       atomic_inc_return(&sched-
>>>> hw_timeline.next_seqno));
>>>    
>>> diff --git a/drivers/gpu/drm/scheduler/tests/sched_tests.h
>>> b/drivers/gpu/drm/scheduler/tests/sched_tests.h
>>> index 27caf8285fb7..fbba38137f0c 100644
>>> --- a/drivers/gpu/drm/scheduler/tests/sched_tests.h
>>> +++ b/drivers/gpu/drm/scheduler/tests/sched_tests.h
>>> @@ -106,7 +106,6 @@ struct drm_mock_sched_job {
>>>    	unsigned int		duration_us;
>>>    	ktime_t			finish_at;
>>>    
>>> -	spinlock_t		lock;
>>>    	struct dma_fence	hw_fence;
>>>    
>>>    	struct kunit		*test;
>>
>