[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8d2605c-930b-4eeb-8e4a-1aa9bbfbb960@intel.com>
Date: Thu, 19 Jun 2025 11:22:57 +0530
From: "Nilawar, Badal" <badal.nilawar@...el.com>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@...el.com>,
<intel-xe@...ts.freedesktop.org>, <dri-devel@...ts.freedesktop.org>,
<linux-kernel@...r.kernel.org>
CC: <anshuman.gupta@...el.com>, <rodrigo.vivi@...el.com>,
<alexander.usyskin@...el.com>, <gregkh@...uxfoundation.org>, <jgg@...dia.com>
Subject: Re: [PATCH v3 06/10] drm/xe/xe_late_bind_fw: Reload late binding fw
in rpm resume
On 19-06-2025 02:35, Daniele Ceraolo Spurio wrote:
>
>
> On 6/18/2025 12:00 PM, Badal Nilawar wrote:
>> Reload late binding fw during runtime resume.
>>
>> v2: Flush worker during runtime suspend
>>
>> Signed-off-by: Badal Nilawar <badal.nilawar@...el.com>
>> ---
>> drivers/gpu/drm/xe/xe_late_bind_fw.c | 2 +-
>> drivers/gpu/drm/xe/xe_late_bind_fw.h | 1 +
>> drivers/gpu/drm/xe/xe_pm.c | 6 ++++++
>> 3 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> b/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> index 54aa08c6bdfd..c0be9611c73b 100644
>> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> @@ -58,7 +58,7 @@ static int xe_late_bind_fw_num_fans(struct
>> xe_late_bind *late_bind)
>> return 0;
>> }
>> -static void xe_late_bind_wait_for_worker_completion(struct
>> xe_late_bind *late_bind)
>> +void xe_late_bind_wait_for_worker_completion(struct xe_late_bind
>> *late_bind)
>> {
>> struct xe_device *xe = late_bind_to_xe(late_bind);
>> struct xe_late_bind_fw *lbfw;
>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h
>> b/drivers/gpu/drm/xe/xe_late_bind_fw.h
>> index 28d56ed2bfdc..07e437390539 100644
>> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.h
>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
>> @@ -12,5 +12,6 @@ struct xe_late_bind;
>> int xe_late_bind_init(struct xe_late_bind *late_bind);
>> int xe_late_bind_fw_load(struct xe_late_bind *late_bind);
>> +void xe_late_bind_wait_for_worker_completion(struct xe_late_bind
>> *late_bind);
>> #endif
>> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
>> index ff749edc005b..91923fd4af80 100644
>> --- a/drivers/gpu/drm/xe/xe_pm.c
>> +++ b/drivers/gpu/drm/xe/xe_pm.c
>> @@ -20,6 +20,7 @@
>> #include "xe_gt.h"
>> #include "xe_guc.h"
>> #include "xe_irq.h"
>> +#include "xe_late_bind_fw.h"
>> #include "xe_pcode.h"
>> #include "xe_pxp.h"
>> #include "xe_trace.h"
>> @@ -460,6 +461,8 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
>> if (err)
>> goto out;
>> + xe_late_bind_wait_for_worker_completion(&xe->late_bind);
>
> I thing this can deadlock, because you do an rpm_put from within the
> worker and if that's the last put it'll end up here and wait for the
> worker to complete.
> We could probably just skip this wait, because the worker can handle
> rpm itself. What we might want to be careful about is to nor re-queue
> it (from xe_late_bind_fw_load below) if it's currently being executed;
> we could also just let the fw be loaded twice if we hit that race
> condition, that shouldn't be an issue apart from doing something not
> needed.
In xe_pm_runtime_get/_put, deadlocks are avoided by verifying the
condition (xe_pm_read_callback_task(xe) == current).
Badal
>
> Daniele
>
>> +
>> /*
>> * Applying lock for entire list op as xe_ttm_bo_destroy and
>> xe_bo_move_notify
>> * also checks and deletes bo entry from user fault list.
>> @@ -550,6 +553,9 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>> xe_pxp_pm_resume(xe->pxp);
>> + if (xe->d3cold.allowed)
>> + xe_late_bind_fw_load(&xe->late_bind);
>> +
>> out:
>> xe_rpm_lockmap_release(xe);
>> xe_pm_write_callback_task(xe, NULL);
>
Powered by blists - more mailing lists