linux-kernel - Re: [PATCH 13/13] sched_ext: Implement load balancer for bypass mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aRI7SpAS_CQeS-Ph@slm.duckdns.org>
Date: Mon, 10 Nov 2025 09:21:46 -1000
From: Tejun Heo <tj@...nel.org>
To: Andrea Righi <arighi@...dia.com>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
	Dan Schatzberg <schatzberg.dan@...il.com>,
	Emil Tsalapatis <etsal@...a.com>, sched-ext@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 13/13] sched_ext: Implement load balancer for bypass mode

Hello,

On Mon, Nov 10, 2025 at 10:38:43AM +0100, Andrea Righi wrote:
> > @@ -965,7 +980,9 @@ static void dispatch_enqueue(struct scx_sched *sch, struct scx_dispatch_q *dsq,
> >  		     !RB_EMPTY_NODE(&p->scx.dsq_priq));
> >  
> >  	if (!is_local) {
> > -		raw_spin_lock(&dsq->lock);
> > +		raw_spin_lock_nested(&dsq->lock,
> > +			(enq_flags & SCX_ENQ_NESTED) ? SINGLE_DEPTH_NESTING : 0);
> > +
> >  		if (unlikely(dsq->id == SCX_DSQ_INVALID)) {
> >  			scx_error(sch, "attempting to dispatch to a destroyed dsq");
> >  			/* fall back to the global dsq */
> 
> Outside the context of the patch we're doing:
> 
> 			/* fall back to the global dsq */
> 			raw_spin_unlock(&dsq->lock);
> 			dsq = find_global_dsq(sch, p);
> 			raw_spin_lock(&dsq->lock);
> 
> I think we should we preserve the nested lock annotation also when locking
> the global DSQ and do:
> 
> 		raw_spin_lock_nested(&dsq->lock,
> 			(enq_flags & SCX_ENQ_NESTED) ? SINGLE_DEPTH_NESTING : 0);
> 
> It seems correct either way, but without this I think we could potentially
> trigger false positive lockdep warnings.

That'd be a bug. I'll add an explicit WARN. I don't think falling back to
global DSQ quietly makes sense - e.g. global DSQ is not even consumed in
bypass mode anymore.

> > +		/*
> > +		 * Moving $p from one non-local DSQ to another. The source DSQ
> > +		 * is already locked. Do an abbreviated dequeue and then perform
> > +		 * enqueue without unlocking $donor_dsq.
> > +		 *
> > +		 * We don't want to drop and reacquire the lock on each
> > +		 * iteration as @donor_dsq can be very long and potentially
> > +		 * highly contended. Donee DSQs are less likely to be contended.
> > +		 * The nested locking is safe as only this LB moves tasks
> > +		 * between bypass DSQs.
> > +		 */
> > +		task_unlink_from_dsq(p, donor_dsq);
> > +		p->scx.dsq = NULL;
> > +		dispatch_enqueue(sch, donee_dsq, p, SCX_ENQ_NESTED);
> 
> Are we racing with dispatch_dequeue() and the holding_cpu dancing here?
> 
> If I read correctly, dispatch_dequeue() reads p->scx.dsq without holding
> the lock, then acquires the lock on that DSQ, but between the read and lock
> acquisition, the load balancer can move the task to a different DSQ.
> 
> Maybe we should change dispatch_dequeue() as well to verify after locking
> that we locked the correct DSQ, and retry if the task was moved.

Right, this is a bug. The LB should hold the source rq lock too. Let me
update the code and add a lockdep annotation.

Thanks.

-- 
tejun