lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39263b3b-3574-43ae-aec1-73fe39e29f10@amd.com>
Date: Thu, 27 Nov 2025 10:08:36 -0500
From: "Kuehling, Felix" <felix.kuehling@....com>
To: Christian König <christian.koenig@....com>,
 phasta@...nel.org, Sumit Semwal <sumit.semwal@...aro.org>,
 Gustavo Padovan <gustavo@...ovan.org>,
 Alex Deucher <alexander.deucher@....com>, David Airlie <airlied@...il.com>,
 Simona Vetter <simona@...ll.ch>, Jani Nikula <jani.nikula@...ux.intel.com>,
 Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
 Rodrigo Vivi <rodrigo.vivi@...el.com>, Tvrtko Ursulin
 <tursulin@...ulin.net>, Huang Rui <ray.huang@....com>,
 Matthew Auld <matthew.auld@...el.com>,
 Matthew Brost <matthew.brost@...el.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 Lucas De Marchi <lucas.demarchi@...el.com>,
 Thomas Hellström <thomas.hellstrom@...ux.intel.com>
Cc: linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
 linaro-mm-sig@...ts.linaro.org, linux-kernel@...r.kernel.org,
 amd-gfx@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
 intel-xe@...ts.freedesktop.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH 2/6] amd/amdkfd: Ignore return code of dma_fence_signal()

On 2025-11-27 04:55, Christian König wrote:
> On 11/27/25 10:48, Philipp Stanner wrote:
>> On Wed, 2025-11-26 at 16:24 -0500, Kuehling, Felix wrote:
>>> On 2025-11-26 08:19, Philipp Stanner wrote:
>>>> The return code of dma_fence_signal() is not really useful as there is
>>>> nothing reasonable to do if a fence was already signaled. That return
>>>> code shall be removed from the kernel.
>>>>
>>>> Ignore dma_fence_signal()'s return code.
>>> I think this is not correct. Looking at the comment in
>>> evict_process_worker, we use the return value to decide a race
>>> conditions where multiple threads are trying to signal the eviction
>>> fence. Only one of them should schedule the restore work. And the other
>>> ones need to increment the reference count to keep evictions balanced.
>> Thank you for pointing that out. Seems then amdkfd is the only user who
>> actually relies on the feature. See below
>>
>>> Regards,
>>>     Felix
>>>
>>>
>>>> Suggested-by: Christian König <christian.koenig@....com>
>>>> Signed-off-by: Philipp Stanner <phasta@...nel.org>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 ++---
>>>>    1 file changed, 2 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> index ddfe30c13e9d..950fafa4b3c3 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
>>>>    static int signal_eviction_fence(struct kfd_process *p)
>>>>    {
>>>>    	struct dma_fence *ef;
>>>> -	int ret;
>>>>    
>>>>    	rcu_read_lock();
>>>>    	ef = dma_fence_get_rcu_safe(&p->ef);
>>>> @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
>>>>    	if (!ef)
>>>>    		return -EINVAL;
>>>>    
>>>> -	ret = dma_fence_signal(ef);
>>>> +	dma_fence_signal(ef);
>> The issue now is that dma_fence_signal()'s return code is actually non-
>> racy, because check + bit-set are protected by lock.
>>
>> Christian's new spinlock series would add a lock function for fences:
>> https://lore.kernel.org/dri-devel/20251113145332.16805-5-christian.koenig@amd.com/
>>
>>
>> So I suppose this should work:
>>
>> dma_fence_lock_irqsave(ef, flags);
>> if (dma_fence_test_signaled_flag(ef)) {
>> 	dma_fence_unlock_irqrestore(ef, flags);
>> 	return true;
>> }
>> dma_fence_signal_locked(ef);
>> dma_fence_unlock_irqrestore(ef, flags);
>>
>> return false;
>>
>>
>> + some cosmetic adjustments for the boolean of course.
>>
>>
>> Would that fly and be reasonable? @Felix, Christian.
> I was just about to reply with the same idea when your mail arrived.

I agree as well. The important feature is that we need to test and 
signal the fence atomically. It may even make sense to add a function 
for that "dma_fence_test_and_signal" that preserves the original 
behaviour of dma_fence_signal. ;)

Regards,
   Felix


>
> So yes looks totally reasonable to me.
>
> Regards,
> Christian.
>
>>
>> P.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ