lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aN4pUAfE30rF6-n4@gpd4>
Date: Thu, 2 Oct 2025 09:27:12 +0200
From: Andrea Righi <arighi@...dia.com>
To: Ryan Newton <rrnewton@...il.com>
Cc: linux-kernel@...r.kernel.org, tj@...nel.org, newton@...a.com
Subject: Re: [PATCH 2/3] sched_ext: optimize first_task update logic

On Wed, Oct 01, 2025 at 10:57:20PM -0400, Ryan Newton wrote:
> From: Ryan Newton <newton@...a.com>
> 
> This is a follow-on optimization to the prior commit which added a
> lockless peek operation on DSQs. That implementation is correct and
> simple, but elides several optimizations.
> 
> Previously, we read the first_task using the same slowpath, irrespective
> of where we enqueue the task. With this change, we instead base the
> update on what we know about the calling context. On both insert and
> removal we can break down whether the change (1) definitely, (2) never,
> or (3) sometimes changes first task. In some cases we know what the new
> first task will be, and can set it more directly.
> 
> Signed-off-by: Ryan Newton <newton@...a.com>
> ---
>  kernel/sched/ext.c | 26 ++++++++++++++++++++------
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index fd0121c03311..1cb10aa9913a 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -953,8 +953,11 @@ static void dispatch_enqueue(struct scx_sched *sch, struct scx_dispatch_q *dsq,
>  				container_of(rbp, struct task_struct,
>  					     scx.dsq_priq);
>  			list_add(&p->scx.dsq_list.node, &prev->scx.dsq_list.node);
> +			/* first task unchanged - no update needed */
>  		} else {
>  			list_add(&p->scx.dsq_list.node, &dsq->list);
> +			/* new task is at head - use fastpath */
> +			rcu_assign_pointer(dsq->first_task, p);
>  		}
>  	} else {
>  		/* a FIFO DSQ shouldn't be using PRIQ enqueuing */
> @@ -962,15 +965,20 @@ static void dispatch_enqueue(struct scx_sched *sch, struct scx_dispatch_q *dsq,
>  			scx_error(sch, "DSQ ID 0x%016llx already had PRIQ-enqueued tasks",
>  				  dsq->id);
>  
> -		if (enq_flags & (SCX_ENQ_HEAD | SCX_ENQ_PREEMPT))
> +		if (enq_flags & (SCX_ENQ_HEAD | SCX_ENQ_PREEMPT)) {
>  			list_add(&p->scx.dsq_list.node, &dsq->list);
> -		else
> +			/* new task inserted at head - use fastpath */
> +			rcu_assign_pointer(dsq->first_task, p);
> +		} else {
> +			bool was_empty;
> +
> +			was_empty = list_empty(&dsq->list);
>  			list_add_tail(&p->scx.dsq_list.node, &dsq->list);
> +			if (was_empty)
> +				rcu_assign_pointer(dsq->first_task, p);
> +		}
>  	}
>  
> -	/* even the add_tail code path may have changed the first element */
> -	dsq_update_first_task(dsq);
> -
>  	/* seq records the order tasks are queued, used by BPF DSQ iterator */
>  	dsq->seq++;
>  	p->scx.dsq_seq = dsq->seq;
> @@ -1023,9 +1031,15 @@ static void task_unlink_from_dsq(struct task_struct *p,
>  		p->scx.dsq_flags &= ~SCX_TASK_DSQ_ON_PRIQ;
>  	}
>  
> +	if (dsq->first_task == p) {
> +		if (dsq->id & SCX_DSQ_FLAG_BUILTIN)
> +			rcu_assign_pointer(dsq->first_task,
> +			  list_next_entry(p, scx.dsq_list.node));

nit: no need to split in two lines, it should fit in the 100 characters per
line limit.

> +		else
> +			dsq_update_first_task(dsq);
> +	}

However, from my comment in PATCH 1/3, if we allow to use
scx_bpf_dsq_peek() only with user DSQs this would become:

if (!(dsq->id & SCX_DSQ_FLAG_BUILTIN) && dsq->first_task == p)
	dsq_update_first_task(dsq);

>  	list_del_init(&p->scx.dsq_list.node);
>  	dsq_mod_nr(dsq, -1);
> -	dsq_update_first_task(dsq);
>  }
>  
>  static void dispatch_dequeue(struct rq *rq, struct task_struct *p)
> -- 
> 2.51.0
> 

Thanks,
-Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ