[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <788c11ab-bbe9-55f1-5dc0-2ada0333648e@linux.intel.com>
Date: Fri, 26 May 2017 16:04:41 -0700
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Tejun Heo <tj@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Mike Galbraith <efault@....de>, Paul Turner <pjt@...gle.com>,
Chris Mason <clm@...com>, kernel-team@...com,
mohini.narkhede@...el.com
Subject: Re: [PATCH v2 for-4.12-fixes 1/2] sched/fair: Use task_groups instead
of leaf_cfs_rq_list to walk all cfs_rqs
On 05/25/2017 07:39 AM, Tejun Heo wrote:
> On Wed, May 24, 2017 at 04:40:34PM -0700, Tim Chen wrote:
>> We did some preliminary testing of this patchset for a well
>> known database benchmark on a 4 socket Skylake server system.
>> It provides a 3.7% throughput boost which is significant for
>> this benchmark.
>
> That's great to hear. Yeah, the walk can be noticeably expensive even
> with moderate number of cgroups. Thanks for sharing the result.
>
Yes, the walk in update_blocked_averages has bad scaling property as it
iterates over *all* cfs_rq's leaf tasks, making it very expensive. It
consumes 11.7% of our cpu cycles for this benchmark when CGROUP
is on. Your patchset skips unused cgroup and reduce the overhead to
10.4%. CPU cycles profile is attached below for your reference.
The scheduler's frequent update of cgroup's laod averages, and
having to iterate all the leaf tasks for each load balance causes
update_blocked_averages to be one of the most expensive functions in the
system, making CGROUP costly. Without CGROUP, schedule only cost 3.3%
of cpu cycles vs 16.4% with CGROUP turned on. Your patchset does reduce
it to 14.9%.
This benchmark has thousands of running tasks, so it puts a good
deal of stress to the scheduler.
Tim
CPU cycles profile:
4.11 Before your patchset with CGROUP:
---------------------------------------
16.42% 0.03% 280 [kernel.vmlinux] [k] schedule
|
--16.39%--schedule
|
--16.31%--__sched_text_start
|
|--12.85%--pick_next_task_fair
| |
| --11.71%--update_blocked_averages
| |
| --5.00%--update_load_avg
|
|--2.04%--finish_task_switch
| |
| |--0.85%--ret_from_intr
| | |
| | --0.85%--do_IRQ
| |
| --0.75%--apic_timer_interrupt
| |
| --0.75%--smp_apic_timer_interrupt
| |
| --0.55%--irq_exit
| |
| --0.55%--__do_softirq
|
--0.51%--deactivate_task
4.11 After your patchset with CGROUP:
-------------------------------------
14.90% 0.04% 337 [kernel.vmlinux] [k] schedule
|
--14.86%--schedule
|
--14.78%--__sched_text_start
|
|--11.51%--pick_next_task_fair
| |
| --10.37%--update_blocked_averages
| |
| --4.55%--update_load_avg
|
|--1.79%--finish_task_switch
| |
| |--0.77%--ret_from_intr
| | |
| | --0.77%--do_IRQ
| |
| --0.65%--apic_timer_interrupt
| |
| --0.65%--smp_apic_timer_interrupt
|
--0.53%--deactivate_task
4.11 with No CGROUP:
--------------------
3.33% 0.04% 336 [kernel.vmlinux] [k] schedule
|
--3.29%--schedule
|
--3.19%--__sched_text_start
|
--1.45%--pick_next_task_fair
|
--1.15%--load_balance
|
--0.87%--find_busiest_group
|
--0.82%--update_sd_lb_stats
Powered by blists - more mailing lists