linux-kernel - Re: [PATCH v1] sched_ext: keep running prev when prev->scx.slice != 0

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z32aulPE-e9hhjKr@slm.duckdns.org>
Date: Tue, 7 Jan 2025 11:20:58 -1000
From: Tejun Heo <tj@...nel.org>
To: Henry Huang <henry.hj@...group.com>
Cc: void@...ifault.com, 谈鉴锋 <henry.tjf@...group.com>,
	"Yan Yan(cailing)" <yanyan.yan@...group.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] sched_ext: keep running prev when prev->scx.slice != 0

Hello,

On Tue, Jan 07, 2025 at 12:25:55PM +0800, Henry Huang wrote:
> When %SCX_OPS_ENQ_LAST is set and prev->scx.slice != 0,
> @prev will be dispacthed into the local DSQ in put_prev_task_scx().
> However, pick_task_scx() is executed before put_prev_task_scx(),
> so it will not pick @prev.
> Set %SCX_RQ_BAL_KEEP in balance_one() to ensure that pick_task_scx()
> can pick @prev.
> 
> Signed-off-by: Henry Huang <henry.hj@...group.com>
> ---
>  kernel/sched/ext.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 81da76a..5f6eb45 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -2837,10 +2837,15 @@ static int balance_one(struct rq *rq, struct task_struct *prev)
>  	/*
>  	 * Didn't find another task to run. Keep running @prev unless
>  	 * %SCX_OPS_ENQ_LAST is in effect.
> +	 *
> +	 * If %SCX_OPS_ENQ_LAST is set and prev->scx.slice != 0 (configured in ops.dispatch()),
> +	 * @prev would be dispatched into the local DSQ in put_prev_task_scx()
> +	 * (excuted after pick_task_scx()). Set %SCX_RQ_BAL_KEEP to ensure that @prev
> +	 * would be picked in pick_task_scx()
>  	 */
>  	if ((prev->scx.flags & SCX_TASK_QUEUED) &&
>  	    (!static_branch_unlikely(&scx_ops_enq_last) ||
> -	     scx_rq_bypassing(rq))) {
> +	     scx_rq_bypassing(rq) || prev->scx.slice)) {

Update current->scx.slice from ops.dispatch() is the recommended way of
extending the current execution and the current behavior is just buggy
especially when scx_ops_enq_last is set.

While the above change fixes the case where ops.dispatch() updates
current->scx.slice without dispatching any task, it's still theoretically
wrong in that if ops.dispatch() updates current->scx.slice and dispatches
tasks, we should keep running current before moving onto other tasks.

To fix this properly, I think what should be done is adding something like
the following. (untested and we probably should cache SCX_TASK_QUEUED
testing result). Can you test whether the following fixes the issues you're
seeing and if so update the patch accordingly?

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 19d2699cf638..48deb5d5510e 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2813,6 +2813,10 @@ static int balance_one(struct rq *rq, struct task_struct *prev)
 
 		flush_dispatch_buf(rq);
 
+		if ((prev->scx.flags & SCX_TASK_QUEUED) && prev->scx.slice) {
+			rq->scx.flags |= SCX_RQ_BAL_KEEP;
+			goto has_tasks;
+		}
 		if (rq->scx.local_dsq.nr)
 			goto has_tasks;
 		if (consume_global_dsq(rq))

Thanks.

-- 
tejun