lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 26 May 2017 16:04:41 -0700
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Mike Galbraith <efault@....de>, Paul Turner <pjt@...gle.com>,
        Chris Mason <clm@...com>, kernel-team@...com,
        mohini.narkhede@...el.com
Subject: Re: [PATCH v2 for-4.12-fixes 1/2] sched/fair: Use task_groups instead
 of leaf_cfs_rq_list to walk all cfs_rqs



On 05/25/2017 07:39 AM, Tejun Heo wrote:
> On Wed, May 24, 2017 at 04:40:34PM -0700, Tim Chen wrote:
>> We did some preliminary testing of this patchset for a well
>> known database benchmark on a 4 socket Skylake server system.
>> It provides a 3.7% throughput boost which is significant for
>> this benchmark.
> 
> That's great to hear.  Yeah, the walk can be noticeably expensive even
> with moderate number of cgroups.  Thanks for sharing the result.
> 

Yes, the walk in update_blocked_averages has bad scaling property as it
iterates over *all* cfs_rq's leaf tasks, making it very expensive. It
consumes 11.7% of our cpu cycles for this benchmark when CGROUP
is on. Your patchset skips unused cgroup and reduce the overhead to
10.4%. CPU cycles profile is attached below for your reference.

The scheduler's frequent update of cgroup's laod averages, and
having to iterate all the leaf tasks for each load balance causes
update_blocked_averages to be one of the most expensive functions in the
system, making CGROUP costly.  Without CGROUP, schedule only cost 3.3%
of cpu cycles vs 16.4% with CGROUP turned on. Your patchset does reduce
it to 14.9%.

This benchmark has thousands of running tasks, so it puts a good
deal of stress to the scheduler.

Tim


CPU cycles profile:

4.11 Before your patchset with CGROUP:
---------------------------------------

     16.42%     0.03%           280  [kernel.vmlinux]                                [k] schedule
             |
              --16.39%--schedule
                        |
                         --16.31%--__sched_text_start
                                   |
                                   |--12.85%--pick_next_task_fair
                                   |          |
                                   |           --11.71%--update_blocked_averages
                                   |                     |
                                   |                      --5.00%--update_load_avg
                                   |
                                   |--2.04%--finish_task_switch
                                   |          |
                                   |          |--0.85%--ret_from_intr
                                   |          |          |
                                   |          |           --0.85%--do_IRQ
                                   |          |
                                   |           --0.75%--apic_timer_interrupt
                                   |                     |
                                   |                      --0.75%--smp_apic_timer_interrupt
                                   |                                |
                                   |                                 --0.55%--irq_exit
                                   |                                           |
                                   |                                            --0.55%--__do_softirq
                                   |
                                    --0.51%--deactivate_task


4.11 After your patchset with CGROUP:
-------------------------------------

     14.90%     0.04%           337  [kernel.vmlinux]                                [k] schedule
             |
              --14.86%--schedule
                        |
                         --14.78%--__sched_text_start
                                   |
                                   |--11.51%--pick_next_task_fair
                                   |          |
                                   |           --10.37%--update_blocked_averages
                                   |                     |
                                   |                      --4.55%--update_load_avg
                                   |
                                   |--1.79%--finish_task_switch
                                   |          |
                                   |          |--0.77%--ret_from_intr
                                   |          |          |
                                   |          |           --0.77%--do_IRQ
                                   |          |
                                   |           --0.65%--apic_timer_interrupt
                                   |                     |
                                   |                      --0.65%--smp_apic_timer_interrupt
                                   |
                                    --0.53%--deactivate_task

4.11 with No CGROUP:
--------------------

      3.33%     0.04%           336  [kernel.vmlinux]                                [k] schedule
             |
              --3.29%--schedule
                        |
                         --3.19%--__sched_text_start
                                   |
                                    --1.45%--pick_next_task_fair
                                              |
                                               --1.15%--load_balance
                                                         |
                                                          --0.87%--find_busiest_group
                                                                    |
                                                                     --0.82%--update_sd_lb_stats

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ