lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKfTPtCUMHjcHk2QZBC0OXY0OwkMzAX4-h94z4k_4f9df7Z_=w@mail.gmail.com>
Date: Tue, 14 Jan 2025 09:07:26 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Hao Jia <jiahao.kernel@...il.com>
Cc: mingo@...hat.com, peterz@...radead.org, mingo@...nel.org, 
	juri.lelli@...hat.com, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, 
	linux-kernel@...r.kernel.org, Hao Jia <jiahao1@...iang.com>
Subject: Re: [PATCH v2] sched/core: Prioritize migrating eligible tasks in sched_balance_rq()

On Tue, 14 Jan 2025 at 04:18, Hao Jia <jiahao.kernel@...il.com> wrote:
>
>
>
> On 2025/1/14 00:40, Vincent Guittot wrote:
> > On Mon, 13 Jan 2025 at 10:21, Hao Jia <jiahao.kernel@...il.com> wrote:
> >>
> >> Friendly ping...
> >>
> >>
> >> On 2024/12/23 17:14, Hao Jia wrote:
> >>> From: Hao Jia <jiahao1@...iang.com>
> >>>
> >>> When the PLACE_LAG scheduling feature is enabled and
> >>> dst_cfs_rq->nr_queued is greater than 1, if a task is
> >>> ineligible (lag < 0) on the source cpu runqueue, it will
> >>> also be ineligible when it is migrated to the destination
> >>> cpu runqueue. Because we will keep the original equivalent
> >>> lag of the task in place_entity(). So if the task was
> >>> ineligible before, it will still be ineligible after
> >>> migration.
> >>>
> >>> So in sched_balance_rq(), we prioritize migrating eligible
> >>> tasks, and we soft-limit ineligible tasks, allowing them
> >>> to migrate only when nr_balance_failed is non-zero to
> >>> avoid load-balancing trying very hard to balance the load.
> >
> > Could you explain why you think it's better to balance eligible tasks
> > in priority and potentially skip a load balance ?
>
> In place_entity(), we maintain the task's original equivalent lag, even
> if we migrate the task to dst_rq, this does not change its eligibility
> attribute.

Yes, but you don't answer the question why it's better to select an
eligible task vs a non eligible task.

>
> When there are multiple tasks on src_rq, and the dst_cpu has some
> runnable tasks, migrating ineligible tasks to dst_rq will not allow them
> to run. Therefore, such task migration is inefficient. We should

Why is it inefficient ? load balance is about evenly balancing the
number of tasks or the load between CPUs, it never says that the newly
migrated task should run immediately

> prioritize migrating tasks that can run on dst_rq.
>
> In other words, migrating ineligible tasks is merely moving them to
> another runqueue to wait until they become eligible.

But I don't get why it's a problem. Migrating an eligible task might
delay its scheduling because of its deadline vs other tasks already
eligible on the dst_rq. Eligible and non eligible tasks are all
runnable, it's just how much they have already run. In addition,
migrating an eligible task will clear its positive vlag with
DELAY_ZERO which is unfair IMO

>
>
> >
> > I can see an interest for idle and newly_idle load balance in order to
> > favor fairness as tasks will become eligible but I don't see why it
> > would be helpful if dst already has some runnable tasks. Furthermore,
> > when a cpu is idle or newly idle, we really want to migrate a task
> > even an non eligible one instead of possibly skipping this load
> > balance round. With your patch, we might end up not pulling any task,
> > increasing the nr_balance_failed and waiting next load balance
> >
>
> If I understand correctly, when the destination CPU is idle, my patch
> does not change the original behavior. it only prevents the migration of
> ineligible tasks when dst_cfs_rq->nr_queued is greater than 1.

It changes the behavior. My concern is that migrating an eligible task
when dst rq already has runnable tasks, doesn't assure you that it
will give any advantage to this eligible task.

On an idle or newly idle cpu, any runnable tasks that will be pulled
will immediately start to run whatever it was eligible or not on src
cpu. In such a situation, we could consider that selecting an eligible
task which has less running time (positive lag) than others could be
more fair because the eligible task will immediately run. But this is
true as long as we migrate only 1 task. If you migrate several tasks,
an eligible task on src rq could even become ineligible quicker on dst
cpu than on src cpu has it lost its lag

>
> If I missed something, please correct me.
>
>
> Thanks,
> Hao
>
>
> >>>
> >>> Below are some benchmark test results. From my test results,
> >>> this patch shows a slight improvement on hackbench.
> >>>
> >>> Benchmark
> >>> =========
> >>>
> >>> All of the benchmarks are done inside a normal cpu cgroup in a
> >>> clean environment with cpu turbo disabled, and test machine is:
> >>>
> >>> Single NUMA machine model is 13th Gen Intel(R) Core(TM)
> >>> i7-13700, 12 Core/24 HT.
> >>>
> >>> Based on master b86545e02e8c.
> >>>
> >>> Results
> >>> =======
> >>>
> >>> hackbench-process-pipes
> >>>                         vanilla                  patched
> >>> Amean     1       0.5837 (   0.00%)      0.5733 (   1.77%)
> >>> Amean     4       1.4423 (   0.00%)      1.4503 (  -0.55%)
> >>> Amean     7       2.5147 (   0.00%)      2.4773 (   1.48%)
> >>> Amean     12      3.9347 (   0.00%)      3.8880 (   1.19%)
> >>> Amean     21      5.3943 (   0.00%)      5.3873 (   0.13%)
> >>> Amean     30      6.7840 (   0.00%)      6.6660 (   1.74%)
> >>> Amean     48      9.8313 (   0.00%)      9.6100 (   2.25%)
> >>> Amean     79     15.4403 (   0.00%)     14.9580 (   3.12%)
> >>> Amean     96     18.4970 (   0.00%)     17.9533 (   2.94%)
> >>>
> >>> hackbench-process-sockets
> >>>                         vanilla                  patched
> >>> Amean     1       0.6297 (   0.00%)      0.6223 (   1.16%)
> >>> Amean     4       2.1517 (   0.00%)      2.0887 (   2.93%)
> >>> Amean     7       3.6377 (   0.00%)      3.5670 (   1.94%)
> >>> Amean     12      6.1277 (   0.00%)      5.9290 (   3.24%)
> >>> Amean     21     10.0380 (   0.00%)      9.7623 (   2.75%)
> >>> Amean     30     14.1517 (   0.00%)     13.7513 (   2.83%)
> >>> Amean     48     24.7253 (   0.00%)     24.2287 (   2.01%)
> >>> Amean     79     43.9523 (   0.00%)     43.2330 (   1.64%)
> >>> Amean     96     54.5310 (   0.00%)     53.7650 (   1.40%)
> >>>
> >>> tbench4 Throughput
> >>>                         vanilla                  patched
> >>> Hmean     1       255.97 (   0.00%)      275.01 (   7.44%)
> >>> Hmean     2       511.60 (   0.00%)      544.27 (   6.39%)
> >>> Hmean     4       996.70 (   0.00%)     1006.57 (   0.99%)
> >>> Hmean     8      1646.46 (   0.00%)     1649.15 (   0.16%)
> >>> Hmean     16     2259.42 (   0.00%)     2274.35 (   0.66%)
> >>> Hmean     32     4725.48 (   0.00%)     4735.57 (   0.21%)
> >>> Hmean     64     4411.47 (   0.00%)     4400.05 (  -0.26%)
> >>> Hmean     96     4284.31 (   0.00%)     4267.39 (  -0.39%)
> >>>
> >>> Signed-off-by: Hao Jia <jiahao1@...iang.com>
> >>> Suggested-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> >>> ---
> >>> Previous discussion link: https://lore.kernel.org/all/20241128084858.25220-1-jiahao.kernel@gmail.com
> >>> Link to v1: https://lore.kernel.org/all/20241218080203.80556-1-jiahao.kernel@gmail.com
> >>>
> >>> v1 to v2:
> >>>    - Modify dst_cfs_rq->nr_running to dst_cfs_rq->nr_queued to
> >>>      resolve conflicts with commit 736c55a02c47 ("sched/fair:
> >>>      Rename cfs_rq.nr_running into nr_queued").
> >>>
> >>>    kernel/sched/fair.c | 34 ++++++++++++++++++++++++++++++++++
> >>>    1 file changed, 34 insertions(+)
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index 5599b0c1ba9b..c884bf631e66 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -9396,6 +9396,30 @@ static inline int migrate_degrades_locality(struct task_struct *p,
> >>>    }
> >>>    #endif
> >>>
> >>> +/*
> >>> + * Check whether the task is ineligible on the destination cpu
> >>> + *
> >>> + * When the PLACE_LAG scheduling feature is enabled and
> >>> + * dst_cfs_rq->nr_queued is greater than 1, if the task
> >>> + * is ineligible, it will also be ineligible when
> >>> + * it is migrated to the destination cpu.
> >>> + */
> >>> +static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_cpu)
> >>> +{
> >>> +     struct cfs_rq *dst_cfs_rq;
> >>> +
> >>> +#ifdef CONFIG_FAIR_GROUP_SCHED
> >>> +     dst_cfs_rq = task_group(p)->cfs_rq[dest_cpu];
> >>> +#else
> >>> +     dst_cfs_rq = &cpu_rq(dest_cpu)->cfs;
> >>> +#endif
> >>> +     if (sched_feat(PLACE_LAG) && dst_cfs_rq->nr_queued &&
> >>> +         !entity_eligible(task_cfs_rq(p), &p->se))
> >>> +             return 1;
> >>> +
> >>> +     return 0;
> >>> +}
> >>> +
> >>>    /*
> >>>     * can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
> >>>     */
> >>> @@ -9420,6 +9444,16 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> >>>        if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
> >>>                return 0;
> >>>
> >>> +     /*
> >>> +      * We want to prioritize the migration of eligible tasks.
> >>> +      * For ineligible tasks we soft-limit them and only allow
> >>> +      * them to migrate when nr_balance_failed is non-zero to
> >>> +      * avoid load-balancing trying very hard to balance the load.
> >>> +      */
> >>> +     if (!env->sd->nr_balance_failed &&
> >>> +         task_is_ineligible_on_dst_cpu(p, env->dst_cpu))
> >>> +             return 0;
> >>> +
> >>>        /* Disregard percpu kthreads; they are where they need to be. */
> >>>        if (kthread_is_per_cpu(p))
> >>>                return 0;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ