[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39263b3b-3574-43ae-aec1-73fe39e29f10@amd.com>
Date: Thu, 27 Nov 2025 10:08:36 -0500
From: "Kuehling, Felix" <felix.kuehling@....com>
To: Christian König <christian.koenig@....com>,
phasta@...nel.org, Sumit Semwal <sumit.semwal@...aro.org>,
Gustavo Padovan <gustavo@...ovan.org>,
Alex Deucher <alexander.deucher@....com>, David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>, Jani Nikula <jani.nikula@...ux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@...el.com>, Tvrtko Ursulin
<tursulin@...ulin.net>, Huang Rui <ray.huang@....com>,
Matthew Auld <matthew.auld@...el.com>,
Matthew Brost <matthew.brost@...el.com>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
Lucas De Marchi <lucas.demarchi@...el.com>,
Thomas Hellström <thomas.hellstrom@...ux.intel.com>
Cc: linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
linaro-mm-sig@...ts.linaro.org, linux-kernel@...r.kernel.org,
amd-gfx@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
intel-xe@...ts.freedesktop.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH 2/6] amd/amdkfd: Ignore return code of dma_fence_signal()
On 2025-11-27 04:55, Christian König wrote:
> On 11/27/25 10:48, Philipp Stanner wrote:
>> On Wed, 2025-11-26 at 16:24 -0500, Kuehling, Felix wrote:
>>> On 2025-11-26 08:19, Philipp Stanner wrote:
>>>> The return code of dma_fence_signal() is not really useful as there is
>>>> nothing reasonable to do if a fence was already signaled. That return
>>>> code shall be removed from the kernel.
>>>>
>>>> Ignore dma_fence_signal()'s return code.
>>> I think this is not correct. Looking at the comment in
>>> evict_process_worker, we use the return value to decide a race
>>> conditions where multiple threads are trying to signal the eviction
>>> fence. Only one of them should schedule the restore work. And the other
>>> ones need to increment the reference count to keep evictions balanced.
>> Thank you for pointing that out. Seems then amdkfd is the only user who
>> actually relies on the feature. See below
>>
>>> Regards,
>>> Felix
>>>
>>>
>>>> Suggested-by: Christian König <christian.koenig@....com>
>>>> Signed-off-by: Philipp Stanner <phasta@...nel.org>
>>>> ---
>>>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 ++---
>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> index ddfe30c13e9d..950fafa4b3c3 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
>>>> static int signal_eviction_fence(struct kfd_process *p)
>>>> {
>>>> struct dma_fence *ef;
>>>> - int ret;
>>>>
>>>> rcu_read_lock();
>>>> ef = dma_fence_get_rcu_safe(&p->ef);
>>>> @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
>>>> if (!ef)
>>>> return -EINVAL;
>>>>
>>>> - ret = dma_fence_signal(ef);
>>>> + dma_fence_signal(ef);
>> The issue now is that dma_fence_signal()'s return code is actually non-
>> racy, because check + bit-set are protected by lock.
>>
>> Christian's new spinlock series would add a lock function for fences:
>> https://lore.kernel.org/dri-devel/20251113145332.16805-5-christian.koenig@amd.com/
>>
>>
>> So I suppose this should work:
>>
>> dma_fence_lock_irqsave(ef, flags);
>> if (dma_fence_test_signaled_flag(ef)) {
>> dma_fence_unlock_irqrestore(ef, flags);
>> return true;
>> }
>> dma_fence_signal_locked(ef);
>> dma_fence_unlock_irqrestore(ef, flags);
>>
>> return false;
>>
>>
>> + some cosmetic adjustments for the boolean of course.
>>
>>
>> Would that fly and be reasonable? @Felix, Christian.
> I was just about to reply with the same idea when your mail arrived.
I agree as well. The important feature is that we need to test and
signal the fence atomically. It may even make sense to add a function
for that "dma_fence_test_and_signal" that preserves the original
behaviour of dma_fence_signal. ;)
Regards,
Felix
>
> So yes looks totally reasonable to me.
>
> Regards,
> Christian.
>
>>
>> P.
Powered by blists - more mailing lists