[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9565366c-6821-4767-bcfc-079378fb4348@amd.com>
Date: Wed, 29 Oct 2025 10:48:43 +0100
From: Christian König <christian.koenig@....com>
To: Matthew Brost <matthew.brost@...el.com>
Cc: intel-xe@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
linux-kernel@...r.kernel.org, jiangshanlai@...il.com, tj@...nel.org,
simona.vetter@...ll.ch, pstanner@...hat.com, dakr@...nel.org
Subject: Re: [RFC PATCH 1/3] workqueue: Add an interface to taint workqueue
lockdep with reclaim
On 10/28/25 21:16, Matthew Brost wrote:
> On Tue, Oct 28, 2025 at 10:32:54AM +0100, Christian König wrote:
>> On 10/21/25 23:39, Matthew Brost wrote:
>>> Drivers often use workqueues that are in the reclaim path (e.g., DRM
>>> scheduler workqueues). It is useful to teach lockdep that memory cannot
>>> be allocated on these workqueues. Add an interface to taint workqueue
>>> lockdep with reclaim.
>>
>> Oh that is so wonderfully evil. I'm absolutely in favor of doing this.
>>
>> But can't we check for the existing WQ_MEM_RECLAIM flag in the workqueue handling instead?
>>
>
> Tejun suggested tying the lockdep annotation to WQ_MEM_RECLAIM, but the
> entire kernel explodes because many workqueues throughout Linux don’t
> adhere to this rule. Here's a link to my latest reply to Tejun [1].
>
> [1] https://patchwork.freedesktop.org/patch/682494/?series=156284&rev=1#comment_1255380
Sorry my fault, I hadn't read up to the latest discussion when I wrote the mail.
My educated guess is that a lot of wq just set WQ_MEM_RECLAIM to be guaranteed to to start even under memory pressure.
So yeah probably best to keep your approach here for now and somebody from core MM should take a look at cleaning it up later on.
>> Additional to that we should also make sure that the same wq is used for timeout and free and that this wq is single threaded *and* has the WQ_MEM_RECLAIM flag set.
>>
>
> Currently, free runs on the same work queue as run_job. We could look
> into moving it to a separate queue, but that’s a separate issue.
We really need to make sure the free and timeout wq are the same and single threaded.
The hack the scheduler currently does with removing and re-inserting the job on a timeout is something we should really try to fix.
> IIRC the workqueue_struct is private and so we can't fish that out in
> the DRM scheduler without adding helpers to workqueue layer. Ofc we
> could do that too if you think this would be helpful.
I might be wrong, but IIRC there was a helper to get the flags from the wq.
That should be enough to test if it is single threaded or not.
>
>> Otherwise we run into the same lifetime issue with the job and memory reclaim during device reset as well.
>>
>
> My patches in this series taint the submit_wq and timeout_wq in the DRM
> scheduler [2]. I have a solid understanding of reclaim rules, and this
> change helped uncover some convoluted cases in Xe—specifically in our
> device reset code involving power management and reclaim [3]. So I can
> confirm this has been quite helpful.
Yeah, completely agree. We most likely have quite a bunch of issues in our reset code path as well.
Regards,
Christian.
>
> Matt
>
> [2] https://patchwork.freedesktop.org/patch/682496/?series=156284&rev=1
> [3] https://patchwork.freedesktop.org/series/156292/
>
>> Regards,
>> Christian.
>>
>>>
>>> Cc: Tejun Heo <tj@...nel.org>
>>> Cc: Lai Jiangshan <jiangshanlai@...il.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@...el.com>
>>> ---
>>> include/linux/workqueue.h | 19 +++++++++++++++++++
>>> kernel/workqueue.c | 9 +++++++++
>>> 2 files changed, 28 insertions(+)
>>>
>>> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
>>> index dabc351cc127..954c7eb7e225 100644
>>> --- a/include/linux/workqueue.h
>>> +++ b/include/linux/workqueue.h
>>> @@ -553,6 +553,25 @@ alloc_workqueue_lockdep_map(const char *fmt, unsigned int flags, int max_active,
>>> 1, lockdep_map, ##args))
>>> #endif
>>>
>>> +
>>> +#ifdef CONFIG_LOCKDEP
>>> +/**
>>> + * taint_reclaim_workqueue - taint workqueue lockdep map with reclaim
>>> + * @wq: workqueue to taint with reclaim
>>> + * gfp: gfp taint
>>> + *
>>> + * Drivers often use workqueues that are in the reclaim path (e.g., DRM
>>> + * scheduler workqueues). It is useful to teach lockdep that memory cannot be
>>> + * allocated on these workqueues.
>>> + */
>>> +extern void taint_reclaim_workqueue(struct workqueue_struct *wq, gfp_t gfp);
>>> +#else
>>> +static inline void taint_reclaim_workqueue(struct workqueue_struct *wq,
>>> + gfp_t gfp)
>>> +{
>>> +}
>>> +#endif
>>> +
>>> /**
>>> * alloc_ordered_workqueue - allocate an ordered workqueue
>>> * @fmt: printf format for the name of the workqueue
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index 45320e27a16c..fea410c20b71 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -5846,6 +5846,15 @@ alloc_workqueue_lockdep_map(const char *fmt, unsigned int flags,
>>> return wq;
>>> }
>>> EXPORT_SYMBOL_GPL(alloc_workqueue_lockdep_map);
>>> +
>>> +void taint_reclaim_workqueue(struct workqueue_struct *wq, gfp_t gfp)
>>> +{
>>> + fs_reclaim_acquire(gfp);
>>> + lock_map_acquire(wq->lockdep_map);
>>> + lock_map_release(wq->lockdep_map);
>>> + fs_reclaim_release(gfp);
>>> +}
>>> +EXPORT_SYMBOL_GPL(taint_reclaim_workqueue);
>>> #endif
>>>
>>> static bool pwq_busy(struct pool_workqueue *pwq)
>>
Powered by blists - more mailing lists