[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXFukGjN0F7W3Hoa@fb.com>
Date: Wed, 21 Jan 2026 16:31:02 -0800
From: Daniel Hodges <hodgesd@...a.com>
To: Andrea Righi <arighi@...dia.com>
CC: <tj@...nel.org>, <void@...ifault.com>, <changwoo@...lia.com>,
<sched-ext@...ts.linux.dev>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched_ext: Clear direct dispatch state on dequeue when
dsq is NULL
On Wed, Jan 21, 2026 at 10:10:59PM +0100, Andrea Righi wrote:
> Hi Daniel,
>
> On Wed, Jan 21, 2026 at 07:56:02AM -0800, Daniel Hodges wrote:
> > When a task is direct-dispatched from ops.select_cpu() or ops.enqueue(),
> > ddsp_dsq_id is set to indicate the target DSQ. If the task is dequeued
> > before dispatch_enqueue() completes (e.g., task killed or receives a
> > signal), dispatch_dequeue() is called with dsq == NULL.
> >
> > In this case, the task is unlinked from ddsp_deferred_locals and
> > holding_cpu is cleared, but ddsp_dsq_id and ddsp_enq_flags are left
> > stale. On the next wakeup, when ops.select_cpu() calls
> > scx_bpf_dsq_insert(), mark_direct_dispatch() finds ddsp_dsq_id already
> > set and triggers:
> >
> > WARNING: CPU: 56 PID: 2323042 at kernel/sched/ext.c:2157
> > scx_bpf_dsq_insert+0x16b/0x1d0
> >
> > Fix this by clearing ddsp_dsq_id and ddsp_enq_flags in dispatch_dequeue()
> > when dsq is NULL, ensuring clean state for subsequent wakeups.
>
> I've tried to fix this a while ago (same as this, right?
> https://github.com/sched-ext/scx/issues/2758), I remember that I applied
> exactly the same patch, but I was still able to trigger the warning.
>
> IIRC there's also a race in ttwu_queue_wakelist tasks and
> sched_setscheduler() that can hit the stale ddsp_dsq_id (maybe other
> cases).
I figured there was probably some other paths that it could race.
> Long story short, the only thing that was working reliably for me was to
> clear ddsp_dsq_id and ddsp_enq_flags in select_task_rq_scx(), but I thought
> it was a bit too overkill and then I've never finished to investigate the
> real issue...
>
> In conclusion, I think this is fixing some of these warnings that we see
> and it's probably good to apply it, but it's not fixing all of them.
>
> Anyway, I'll do some tests with this patch and report back!
>
> Thanks,
> -Andrea
Sounds good, I hit this running cosmos on a moderately loaded machine.
I'll see if I can get a reproducer made and do some more testing.
Powered by blists - more mailing lists