linux-kernel - Re: [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z6aK2J4F7fnzugxs@gpd3>
Date: Fri, 7 Feb 2025 23:36:08 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
	linux-kernel@...r.kernel.org, sched-ext@...a.com
Subject: Re: [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto
 local dispatching of migration disabled tasks

Hi Tejun,

On Fri, Feb 07, 2025 at 10:58:23AM -1000, Tejun Heo wrote:
> Migration disabled tasks are special and pinned to their previous CPUs. They
> tripped up some unsuspecting BPF schedulers as their ->nr_cpus_allowed may
> not agree with the bits set in ->cpus_ptr. Make it easier for BPF schedulers
> by automatically dispatching them to the pinned local DSQs by default. If a
> BPF scheduler wants to handle migration disabled tasks explicitly, it can
> set SCX_OPS_ENQ_MIGRATION_DISABLED.
> 
> Signed-off-by: Tejun Heo <tj@...nel.org>
> ---
>  kernel/sched/ext.c |   23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -123,6 +123,19 @@ enum scx_ops_flags {
>  	SCX_OPS_SWITCH_PARTIAL	= 1LLU << 3,
>  
>  	/*
> +	 * A migration disabled task can only execute on its current CPU. By
> +	 * default, such tasks are automatically put on the CPU's local DSQ with
> +	 * the default slice on enqueue. If this ops flag is set, they also go
> +	 * through ops.enqueue().
> +	 *
> +	 * A migration disabled task never invokes ops.select_cpu() as it can
> +	 * only select the current CPU. Also, p->cpus_ptr will only contain its
> +	 * current CPU while p->nr_cpus_allowed keeps tracking p->user_cpus_ptr
> +	 * and thus may disagree with cpumask_weight(p->cpus_ptr).
> +	 */
> +	SCX_OPS_ENQ_MIGRATION_DISABLED = 1LLU << 4,
> +
> +	/*
>  	 * CPU cgroup support flags
>  	 */
>  	SCX_OPS_HAS_CGROUP_WEIGHT = 1LLU << 16,	/* cpu.weight */
> @@ -130,6 +143,7 @@ enum scx_ops_flags {
>  	SCX_OPS_ALL_FLAGS	= SCX_OPS_KEEP_BUILTIN_IDLE |
>  				  SCX_OPS_ENQ_LAST |
>  				  SCX_OPS_ENQ_EXITING |
> +				  SCX_OPS_ENQ_MIGRATION_DISABLED |
>  				  SCX_OPS_SWITCH_PARTIAL |
>  				  SCX_OPS_HAS_CGROUP_WEIGHT,
>  };
> @@ -882,6 +896,7 @@ static bool scx_warned_zero_slice;
>  
>  static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_last);
>  static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_exiting);
> +static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_migration_disabled);
>  static DEFINE_STATIC_KEY_FALSE(scx_ops_cpu_preempt);
>  static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_enabled);
>  
> @@ -2014,6 +2029,11 @@ static void do_enqueue_task(struct rq *r
>  	    unlikely(p->flags & PF_EXITING))
>  		goto local;
>  
> +	/* see %SCX_OPS_ENQ_MIGRATION_DISABLED */
> +	if (!static_branch_unlikely(&scx_ops_enq_migration_disabled) &&
> +	    is_migration_disabled(p))
> +		goto local;

Maybe not in this patch set, but it'd be nice to have an event counter for
this, as skipping ops.enqueue() might introduce latency issues. Having a
feedback could help to determine if we need to enable
SCX_OPS_ENQ_MIGRATION_DISABLED in some schedulers.

I'm also a bit conflicted if the default should be on or off, we're
changing the previous behavior, but OTOH this is going to prevent some
potential breakage (due to the nr_cpus_allowed mismatch) and server
workload is going to benefit from this, so it seems that there are more
pros than cons at dispatching migration_disabled tasks directly by default.

And I also did a quick test with this and it seems good, so:

Acked-by: Andrea Righi <arighi@...dia.com>

-Andrea