[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6aK2J4F7fnzugxs@gpd3>
Date: Fri, 7 Feb 2025 23:36:08 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
linux-kernel@...r.kernel.org, sched-ext@...a.com
Subject: Re: [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto
local dispatching of migration disabled tasks
Hi Tejun,
On Fri, Feb 07, 2025 at 10:58:23AM -1000, Tejun Heo wrote:
> Migration disabled tasks are special and pinned to their previous CPUs. They
> tripped up some unsuspecting BPF schedulers as their ->nr_cpus_allowed may
> not agree with the bits set in ->cpus_ptr. Make it easier for BPF schedulers
> by automatically dispatching them to the pinned local DSQs by default. If a
> BPF scheduler wants to handle migration disabled tasks explicitly, it can
> set SCX_OPS_ENQ_MIGRATION_DISABLED.
>
> Signed-off-by: Tejun Heo <tj@...nel.org>
> ---
> kernel/sched/ext.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -123,6 +123,19 @@ enum scx_ops_flags {
> SCX_OPS_SWITCH_PARTIAL = 1LLU << 3,
>
> /*
> + * A migration disabled task can only execute on its current CPU. By
> + * default, such tasks are automatically put on the CPU's local DSQ with
> + * the default slice on enqueue. If this ops flag is set, they also go
> + * through ops.enqueue().
> + *
> + * A migration disabled task never invokes ops.select_cpu() as it can
> + * only select the current CPU. Also, p->cpus_ptr will only contain its
> + * current CPU while p->nr_cpus_allowed keeps tracking p->user_cpus_ptr
> + * and thus may disagree with cpumask_weight(p->cpus_ptr).
> + */
> + SCX_OPS_ENQ_MIGRATION_DISABLED = 1LLU << 4,
> +
> + /*
> * CPU cgroup support flags
> */
> SCX_OPS_HAS_CGROUP_WEIGHT = 1LLU << 16, /* cpu.weight */
> @@ -130,6 +143,7 @@ enum scx_ops_flags {
> SCX_OPS_ALL_FLAGS = SCX_OPS_KEEP_BUILTIN_IDLE |
> SCX_OPS_ENQ_LAST |
> SCX_OPS_ENQ_EXITING |
> + SCX_OPS_ENQ_MIGRATION_DISABLED |
> SCX_OPS_SWITCH_PARTIAL |
> SCX_OPS_HAS_CGROUP_WEIGHT,
> };
> @@ -882,6 +896,7 @@ static bool scx_warned_zero_slice;
>
> static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_last);
> static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_exiting);
> +static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_migration_disabled);
> static DEFINE_STATIC_KEY_FALSE(scx_ops_cpu_preempt);
> static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_enabled);
>
> @@ -2014,6 +2029,11 @@ static void do_enqueue_task(struct rq *r
> unlikely(p->flags & PF_EXITING))
> goto local;
>
> + /* see %SCX_OPS_ENQ_MIGRATION_DISABLED */
> + if (!static_branch_unlikely(&scx_ops_enq_migration_disabled) &&
> + is_migration_disabled(p))
> + goto local;
Maybe not in this patch set, but it'd be nice to have an event counter for
this, as skipping ops.enqueue() might introduce latency issues. Having a
feedback could help to determine if we need to enable
SCX_OPS_ENQ_MIGRATION_DISABLED in some schedulers.
I'm also a bit conflicted if the default should be on or off, we're
changing the previous behavior, but OTOH this is going to prevent some
potential breakage (due to the nr_cpus_allowed mismatch) and server
workload is going to benefit from this, so it seems that there are more
pros than cons at dispatching migration_disabled tasks directly by default.
And I also did a quick test with this and it seems good, so:
Acked-by: Andrea Righi <arighi@...dia.com>
-Andrea
Powered by blists - more mailing lists