linux-kernel - Re: [PATCH v2] sched/fair: Introduce priority load balance to reduce interference from IDLE tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBTdmPQsC-RiOXbnMVsfz0P7tex7JMcHOsjtAN6uEri3A@mail.gmail.com>
Date:   Tue, 23 Aug 2022 15:19:49 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     "zhangsong (J)" <zhangsong34@...wei.com>
Cc:     Abel Wu <wuyun.abel@...edance.com>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, kernel test robot <lkp@...el.com>
Subject: Re: [PATCH v2] sched/fair: Introduce priority load balance to reduce
 interference from IDLE tasks

Hi Zhang,

On Mon, 22 Aug 2022 at 08:49, zhangsong (J) <zhangsong34@...wei.com> wrote:
>
> Hi, Vincent,
>
> On 2022/8/20 0:04, Vincent Guittot wrote:
> > On Fri, 19 Aug 2022 at 14:35, Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> >>
> >> Hi Zhang,
> >>
> >> On Fri, 19 Aug 2022 at 12:54, zhangsong (J) <zhangsong34@...wei.com> wrote:
> >>>
> >>>
> >>> On 2022/8/18 16:31, Vincent Guittot wrote:
> >>>> Le jeudi 18 août 2022 à 10:46:55 (+0800), Abel Wu a écrit :
> >>>>> On 8/17/22 8:58 PM, Vincent Guittot Wrote:
> >>>>>> On Tue, 16 Aug 2022 at 04:53, zhangsong (J) <zhangsong34@...wei.com> wrote:
> >>>>>>>
> >>>> ...

...

> >>>
> >>> Thanks for your reply.
> >>> I have tried your patch and run test compared with it, it seems that the
> >>> patch you provide makes no sense.
> >>> The test result is below(1000 idle tasks bounds to CPU 0-1 and 10 normal
> >>> tasks bounds to CPU 1-2):
> >>>
> >>> =================================================================
> >>>
> >>> Without patch:
> >>>
> >>>
> >>>             6,777.37 msec cpu-clock                 #    1.355 CPUs utilized
> >>>               20,812      context-switches          #    0.003 M/sec
> >>>                    0      cpu-migrations            #    0.000 K/sec
> >>>                    0      page-faults               #    0.000 K/sec
> >>>       13,333,983,148      cycles                    #    1.967 GHz
> >>>        6,457,930,305      instructions              #    0.48  insn per cycle
> >>>        2,125,644,649      branches                  #  313.639 M/sec
> >>>            1,690,587      branch-misses             #    0.08% of all
> >>> branches
> >>>         5.001931983 seconds time elapsed
> >>>
> >>> With your patch:
> >>>
> >>>
> >>>             6,791.46 msec cpu-clock                 #    1.358 CPUs utilized
> >>>               20,996      context-switches          #    0.003 M/sec
> >>>                    0      cpu-migrations            #    0.000 K/sec
> >>>                    0      page-faults               #    0.000 K/sec
> >>>       13,467,573,052      cycles                    #    1.983 GHz
> >>>        6,516,989,062      instructions              #    0.48  insn per cycle
> >>>        2,145,139,220      branches                  #  315.858 M/sec
> >>>            1,751,454      branch-misses             #    0.08% of all
> >>> branches
> >>>
> >>>          5.002274267 seconds time elapsed
> >>>
> >>> With my patch:
> >>>
> >>>
> >>>             7,495.14 msec cpu-clock                 #    1.499 CPUs utilized
> >>>               23,176      context-switches          #    0.003 M/sec
> >>>                  309      cpu-migrations            #    0.041 K/sec
> >>>                    0      page-faults               #    0.000 K/sec
> >>>       14,849,083,489      cycles                    #    1.981 GHz
> >>>        7,180,832,268      instructions              #    0.48  insn per cycle
> >>>        2,363,300,644      branches                  #  315.311 M/sec
> >>>            1,964,169      branch-misses             #    0.08% of all
> >>> branches
> >>>
> >>>          5.001713352 seconds time elapsed
> >>> ===============================================================
> >>>
> >>> Obviously,  when your patch is applied, the cpu-migrations of normal
> >>> tasks is still 0 and the
> >>> CPU ulization of normal tasks have no improvement compared with no patch
> >>> applied.
> >>> When apply my patch, the cpu-migrations and CPU ulization of normal
> >>> tasks can both improve.
> >>> I cannot explain the result with your patch, you also can test it by
> >>> yourself.
> >>
> >> Do you have more details about the test that your are running ?
> >>
> >> Do cpu0-2 share their cache ?
> >> Which kingd of task are the normal and idle tasks ? always running tasks ?
> >>
> >> I'm going to try to reproduce your problem locally
> >
> > Some details of your UC are missing. I have tried to reproduce your
> > example above:
> >      1000 idle tasks bounds to CPU 0-1 and 10 normal tasks bounds to CPU 1-2
> >
> > Let assume that for any reason, the 10 normal tasks wake up on CPU 1.
> > Then, the thousand of idle tasks are moved to CPU0 by load balance and
> > only normal tasks stay on CPU1. Then load balance will move some
> > normal tasks to CPU2.
> >
> > My only way to reproduce something similar to your example, is to pin
> > the 1000 idle tasks on CPU1 so they can't move to CPU0. Then I can see
> > that load balance reaches loop_max limit and gets hard time moving
> > normal tasks on CPU2. But in this later case, my patch helps to move
> > normal tasks on CPU2. Something is missing in the description of your
> > UC.
> >
> > Sidenote, I have the same kind of problem with 1000 normal task with
> > low priority so it's not a matter of idle vs normal tasks
> >
> > Regards,
> > Vincent
> >
>
> Sorry for my slow reply.
>
> I have found a test case which can more illustrate this problem
> accurately. The test case is below.
>
> 1. a dead foreach loop process run as normal or idle task
> $ cat test.c
> int main(int argc, char **argv)
> {
>          int i = 0;
>          int duration = atoi(argv[1]);
>
>          while(1) {
>                  usleep(duration);
>                  for(i = 0; i < 100000; i++) {}
>          }
> }
> $ gcc -o test test.c
>
> 2. firstly spawn 500 idle tasks which bounds to CPU 0-2
> 3. secondly spawn 10 normal tasks which also bounds to CPU 0-2
> 4. lastly spawn 500 idle tasks which bounds to CPU 0 only
> 5. perf stat normal tasks and get CPU utilization and cpu-migrations
>
>
> Below is the whole test script.
> $ cat test.sh
> #!/bin/bash
>
> # create normal and idle cgroup path
> mkdir /sys/fs/cgroup/cpu/normal/
> mkdir /sys/fs/cgroup/cpu/idle/

so you put "idle" tasks in a task group and normal tasks in another
one. But both groups have default weight/share so you lose the idle
weight for idle tasks and each group will have half the cpus capacity,
i.e. 1.5 cpus per group. And this is  what I have with the current
kernel, with your patch or with my patch.

In fact, there is enough tasks (510) not pinned to cpu0 to make the
system balance correctly

With this group hierarchy, the idle tasks will have the same weight
priority as normal tasks. Is it really what you want ?

tip kernel:
          7 673,15 msec task-clock                #    1,534 CPUs
utilized
            12 662      context-switches          #    1,650 K/sec
                 0      cpu-migrations            #    0,000 /sec
       5,003493176 seconds time elapsed

your patch:
          7 488,35 msec task-clock                #    1,497 CPUs
utilized
            12 338      context-switches          #    1,648 K/sec
                 3      cpu-migrations            #    0,401 /sec
       5,003406005 seconds time elapsed

my patch
          7 569,57 msec task-clock                #    1,513 CPUs
utilized
            12 460      context-switches          #    1,646 K/sec
                 0      cpu-migrations            #    0,000 /sec
       5,003437278 seconds time elapsed


>
> # spawn 500 idle tasks and bind tasks to CPU 0-2
> for ((i = 0; i < 500; i++))
> do
>                 taskset -c 0-2 ./test 200 &
>                 pid=$!
>                 # change to SCHED_IDLE policy
>                 chrt -i -p 0 $pid
>                 echo $pid > /sys/fs/cgroup/cpu/idle/tasks
> done
>
> # spawn 10 normal tasks and bind tasks to CPU 0-2
> normal_tasks=
> for ((i = 0; i < 10; i++))
> do
>                 taskset -c 0-2 ./test 500 &
>                 pid=$!
>                 normal_tasks+=$pid,
>                 echo $pid > /sys/fs/cgroup/cpu/normal/tasks
> done
>
> # spawn 500 idle tasks and bind tasks to CPU 0 only
> for ((i = 0; i < 500; i++))
> do
>                 taskset -c 0 ./test 200 &
>                 pid=$!
>                 # change to SCHED_IDLE policy
>                 chrt -i -p 0 $pid
>                 echo $pid > /sys/fs/cgroup/cpu/idle/tasks
> done
>
> # perf stat normal tasks
> perf stat -a -p $normal_tasks sleep 5
> pkill -f test
>
>
> You can try the above case and test it with your patch.
>
> Regards,
> Zhang Song