[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aVSJfKkE0SiKrPDz@lstrano-desk.jf.intel.com>
Date: Tue, 30 Dec 2025 18:25:00 -0800
From: Matthew Brost <matthew.brost@...el.com>
To: Marco Crivellari <marco.crivellari@...e.com>
CC: <linux-kernel@...r.kernel.org>, <intel-xe@...ts.freedesktop.org>,
<dri-devel@...ts.freedesktop.org>, Tejun Heo <tj@...nel.org>, Lai Jiangshan
<jiangshanlai@...il.com>, Frederic Weisbecker <frederic@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Michal Hocko
<mhocko@...e.com>, Lucas De Marchi <lucas.demarchi@...el.com>, "Thomas
Hellstrom" <thomas.hellstrom@...ux.intel.com>, Rodrigo Vivi
<rodrigo.vivi@...el.com>, David Airlie <airlied@...il.com>, Simona Vetter
<simona@...ll.ch>
Subject: Re: [PATCH] drm/xe: Replace use of system_wq with system_percpu_wq
On Tue, Dec 30, 2025 at 04:42:51PM +0100, Marco Crivellari wrote:
> On Wed, Dec 24, 2025 at 7:28 PM Matthew Brost <matthew.brost@...el.com> wrote:
> > [...]
> > > drivers/gpu/drm/xe/xe_tlb_inval.c | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > index 918a59e686ea..b2cf6e17fbc5 100644
> > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> > > @@ -94,7 +94,7 @@ static void xe_tlb_inval_fence_timeout(struct work_struct *work)
> > > xe_tlb_inval_fence_signal(fence);
> > > }
> > > if (!list_empty(&tlb_inval->pending_fences))
> > > - queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> > > + queue_delayed_work(system_percpu_wq, &tlb_inval->fence_tdr,
> >
> > Actually system_wq or system_percpu_wq doesn't work here as this is the
> > fence signaling path. We should use one Xe's ordered work queues which
> > is properly setup to be reclaim same.
>
> Hi,
>
> So only for this specific workqueue we should use for example this, instead:
>
> 462 /** @ordered_wq: used to serialize compute mode resume */
> 463 struct workqueue_struct *ordered_wq;
>
> I noticed this has been allocated using:
>
> 490 xe->ordered_wq = alloc_ordered_workqueue("xe-ordered-wq", 0);
>
> Using alloc_ordered_workqueue() makes this workqueue to be unbound.
>
> 569 #define alloc_ordered_workqueue(fmt, flags, args...) \
> 570 alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1, ##args)
>
> So this patch should be split in 2:
> - 1 patch with Xe's ordered workqueue, changing the behavior to
> unbound "implicitly"
> - 1 patch changing system_wq with the new per-cpu wq (system_percpu_wq).
>
> To keep this workqueue per-cpu we can use xe->unordered_wq, that makes use of
> alloc_workqueue() without specifying flags (eg. WQ_UNBOUND or the new
> WQ_PERCPU),
> making this workqueue per-cpu.
>
> Would this sound reasonable to you?
>
What I'd probably do here is store a WQ pointer in 'struct
xe_tlb_inval' and update all calls in xe_tlb_inval to use that work
queue.
Since these are tied to a GT currently - see xe_gt_tlb_inval_init_early,
I'd set the WQ pointer in 'struct xe_tlb_inval' to 'gt->ordered_wq'.
'gt->ordered_wq' is the per-GT WQ for resets, jobs timeouts, VF
migrations (all operations in reclaim path), so this seems to fit here
as well.
Matt
> Thanks!
> --
>
> Marco Crivellari
>
> L3 Support Engineer
Powered by blists - more mailing lists