linux-kernel - Re: [PATCH v2] sched/fair: Introduce priority load balance to reduce interference from IDLE tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBcJhC4qPQuK9g4bL0sgtmqkA3JZmnGJz7DaejsUPkOeg@mail.gmail.com>
Date:   Fri, 19 Aug 2022 14:35:37 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     "zhangsong (J)" <zhangsong34@...wei.com>
Cc:     Abel Wu <wuyun.abel@...edance.com>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, kernel test robot <lkp@...el.com>
Subject: Re: [PATCH v2] sched/fair: Introduce priority load balance to reduce
 interference from IDLE tasks

Hi Zhang,

On Fri, 19 Aug 2022 at 12:54, zhangsong (J) <zhangsong34@...wei.com> wrote:
>
>
> On 2022/8/18 16:31, Vincent Guittot wrote:
> > Le jeudi 18 août 2022 à 10:46:55 (+0800), Abel Wu a écrit :
> >> On 8/17/22 8:58 PM, Vincent Guittot Wrote:
> >>> On Tue, 16 Aug 2022 at 04:53, zhangsong (J) <zhangsong34@...wei.com> wrote:
> >>>>
> > ...
> >
> >>>> Yes, this is usually a corner case, but suppose that some non-idle tasks bounds to CPU 1-2
> >>>>
> >>>> and idle tasks bounds to CPU 0-1, so CPU 1 may has many idle tasks and some non-idle
> >>>>
> >>>> tasks while idle tasks on CPU 1 can not be pulled to CPU 2, when trigger load balance if
> >>>>
> >>>> CPU 2 should pull some tasks from CPU 1, the bad result is idle tasks of CPU 1 cannot be
> >>>>
> >>>> migrated and non-idle tasks also cannot be migrated in case of env->loop_max constraint.
> >>> env->loop_max adds a break but load_balance will continue with next
> >>> tasks so it also tries to pull your non idle task at the end after
> >>> several breaks.
> >> Loop will be terminated without LBF_NEED_BREAK if exceeds loop_max :)
> > Argh yes, my brain is not yet back from vacation
> > I have been confused by loop_max and loop_break being set to the same value 32
> >
> > Zhang Song, Could you try the patch below ? If it works, I will prepare a
> > clean patch with all tags
> >
> >
> >
> > sched/fair: make sure to try to detach at least one movable task
> >
> > During load balance we try at most env->loop_max time to move a task. But
> > it can happen that the LRU tasks (ie tail of the cfs_tasks list) can't
> > be moved to dst_cpu because of affinity. In this case, loop in the list
> > until we found at least one.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > ---
> >   kernel/sched/fair.c | 12 +++++++++---
> >   1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index da388657d5ac..02b7b808e186 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8052,8 +8052,12 @@ static int detach_tasks(struct lb_env *env)
> >               p = list_last_entry(tasks, struct task_struct, se.group_node);
> >
> >               env->loop++;
> > -             /* We've more or less seen every task there is, call it quits */
> > -             if (env->loop > env->loop_max)
> > +             /*
> > +              * We've more or less seen every task there is, call it quits
> > +              * unless we haven't found any movable task yet.
> > +              */
> > +             if (env->loop > env->loop_max &&
> > +                 !(env->flags & LBF_ALL_PINNED))
> >                       break;
> >
> >               /* take a breather every nr_migrate tasks */
> > @@ -10182,7 +10186,9 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> >
> >               if (env.flags & LBF_NEED_BREAK) {
> >                       env.flags &= ~LBF_NEED_BREAK;
> > -                     goto more_balance;
> > +                     /* Stop if we tried all running tasks */
> > +                     if (env.loop < busiest->nr_running)
> > +                             goto more_balance;
> >               }
> >
> >               /*
> > --
> > 2.17.1
>
> Thanks for your reply.
> I have tried your patch and run test compared with it, it seems that the
> patch you provide makes no sense.
> The test result is below(1000 idle tasks bounds to CPU 0-1 and 10 normal
> tasks bounds to CPU 1-2):
>
> =================================================================
>
> Without patch:
>
>
>            6,777.37 msec cpu-clock                 #    1.355 CPUs utilized
>              20,812      context-switches          #    0.003 M/sec
>                   0      cpu-migrations            #    0.000 K/sec
>                   0      page-faults               #    0.000 K/sec
>      13,333,983,148      cycles                    #    1.967 GHz
>       6,457,930,305      instructions              #    0.48  insn per cycle
>       2,125,644,649      branches                  #  313.639 M/sec
>           1,690,587      branch-misses             #    0.08% of all
> branches
>        5.001931983 seconds time elapsed
>
> With your patch:
>
>
>            6,791.46 msec cpu-clock                 #    1.358 CPUs utilized
>              20,996      context-switches          #    0.003 M/sec
>                   0      cpu-migrations            #    0.000 K/sec
>                   0      page-faults               #    0.000 K/sec
>      13,467,573,052      cycles                    #    1.983 GHz
>       6,516,989,062      instructions              #    0.48  insn per cycle
>       2,145,139,220      branches                  #  315.858 M/sec
>           1,751,454      branch-misses             #    0.08% of all
> branches
>
>         5.002274267 seconds time elapsed
>
> With my patch:
>
>
>            7,495.14 msec cpu-clock                 #    1.499 CPUs utilized
>              23,176      context-switches          #    0.003 M/sec
>                 309      cpu-migrations            #    0.041 K/sec
>                   0      page-faults               #    0.000 K/sec
>      14,849,083,489      cycles                    #    1.981 GHz
>       7,180,832,268      instructions              #    0.48  insn per cycle
>       2,363,300,644      branches                  #  315.311 M/sec
>           1,964,169      branch-misses             #    0.08% of all
> branches
>
>         5.001713352 seconds time elapsed
> ===============================================================
>
> Obviously,  when your patch is applied, the cpu-migrations of normal
> tasks is still 0 and the
> CPU ulization of normal tasks have no improvement compared with no patch
> applied.
> When apply my patch, the cpu-migrations and CPU ulization of normal
> tasks can both improve.
> I cannot explain the result with your patch, you also can test it by
> yourself.

Do you have more details about the test that your are running ?

Do cpu0-2 share their cache ?
Which kingd of task are the normal and idle tasks ? always running tasks ?

I'm going to try to reproduce your problem locally

Regards,
Vincent

>
> Best,
> Zhang Song
>
> >
> >>>> This will cause non-idle  tasks cannot achieve  more CPU utilization.
> >>> Your problem is not linked to IDLE vs NORMAL tasks but to the large
> >>> number of pinned tasks that can't migrate on CPU2. You can end with
> >>> the same behavior without using IDLE tasks but only NORMAL tasks.
> >> I feel the same thing.
> >>
> >> Best,
> >> Abel
> > .