[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM31RLy4hvaRUi5iVeOyEapRtguR5rNUrn+3oGax_Mm3GxqTw@mail.gmail.com>
Date: Sat, 9 Jul 2011 00:34:21 -0700
From: Paul Turner <pjt@...gle.com>
To: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc: Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
Bharata B Rao <bharata@...ux.vnet.ibm.com>,
Dhaval Giani <dhaval.giani@...il.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
Srivatsa Vaddagiri <vatsa@...ibm.com>,
Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
Pavel Emelyanov <xemul@...nvz.org>,
Hu Tao <hutao@...fujitsu.com>
Subject: Re: [patch 00/17] CFS Bandwidth Control v7.1
On Fri, Jul 8, 2011 at 3:32 AM, Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
> On Fri, 2011-07-08 at 00:39 -0700, Paul Turner wrote:
>>
>> > Going beyond that
>> > would be using static_branch() to track if there is any bandwidth
>> > tracking required at all.
>> >
>>
>> I spent some time examining this option as well. Our toolchain
>> apparently is stuck on gcc-4.4 which left me scratching my head at the
>> supposed jump label assembly being omitted until I realized
>> CC_HAS_ASM_GOTO was missing. I will roll this up also and benchmark
>> tomorrow.
>
> Ah, does it actually make things worse if it uses the static_branch
> fallbacks? If so we should probably use some HAVE_JUMP_LABEL foo.
>
I started whittling at this today, the numbers so far on my hardware (i7
12-thread) are as follows.
Base performance with !CONFIG_CFS_BW:
Performance counter stats for './pipe-test-100k' (50 runs):
893,486,206 instructions # 1.063 IPC ( +- 0.296% )
840,904,951 cycles ( +- 0.359% )
160,076,980 branches ( +- 0.305% )
0.735022174 seconds time elapsed ( +- 0.143% )
Original performance (v7.2):
cycles instructions
branches
----------------------------------------------------------------------------------------------------
base 893,486,206 840,904,951 160,076,980
+unconstrained 929,244,021 (+4.00) 883,923,194 (+5.12)
167,131,228 (+4.41)
+10000000000/1000: 934,424,430 (+4.58) 875,605,677 (+4.13)
168,466,469 (+5.24)
+10000000000/10000: 940,048,385 (+5.21) 883,922,489 (+5.12)
169,512,329 (+5.89)
+10000000000/100000: 934,351,875 (+4.57) 888,878,742 (+5.71)
168,457,809 (+5.24)
+10000000000/1000000: 931,127,353 (+4.21) 874,830,745 (+4.03)
167,861,492 (+4.86)
The first step was fixing the missing inlining on update_curr(). This was a
major improvement.
Fix inlining on update_curr:
cycles instructions
branches
----------------------------------------------------------------------------------------------------
base 893,486,206 840,904,951 160,076,980
+unconstrained 909,771,488 (+1.82) 850,091,039 (+1.09)
164,385,813 (+2.69)
+10000000000/1000: 915,384,142 (+2.45) 859,591,791 (+2.22)
165,616,386 (+3.46)
+10000000000/10000: 922,657,403 (+3.26) 865,701,436 (+2.95)
166,996,717 (+4.32)
+10000000000/100000: 928,636,540 (+3.93) 866,234,685 (+3.01)
168,111,517 (+5.02)
+10000000000/1000000: 922,311,143 (+3.23) 859,445,796 (+2.20)
166,922,517 (+4.28)
I also realized on the dequeue path we can shave a branch by reversing the
order of some of the conditionals.
In particular reordering (!runnable || !enabled) ---> (!enabled || !runnable).
The latter choice saves us a branch in the !enabled case when !runnable, and
has the same cost in the enabled case.
Speed up return_cfs_rq_runtime:
cycles instructions
branches
----------------------------------------------------------------------------------------------------
base 893,486,206 840,904,951 160,076,980
+unconstrained 906,151,427 (+1.42) 877,497,749 (+4.35)
163,738,499 (+2.29)
+10000000000/1000: 910,284,839 (+1.88) 885,136,630 (+5.26)
164,804,085 (+2.95)
+10000000000/10000: 911,860,656 (+2.06) 891,433,792 (+6.01)
165,098,115 (+3.14)
+10000000000/100000: 913,062,037 (+2.19) 890,918,139 (+5.95)
165,327,113 (+3.28)
+10000000000/1000000: 920,966,554 (+3.08) 899,250,040 (+6.94)
166,813,750 (+4.21)
Finally introducing jump labels when there are no constrained groups claws back
a good portion of the remaining time.
Add jump labels:
cycles instructions
branches
----------------------------------------------------------------------------------------------------
base 893,486,206 840,904,951 160,076,980
+unconstrained 900,477,543 (+0.78) 890,310,950 (+5.88)
161,037,844 (+0.60)
+10000000000/1000: 921,436,697 (+3.13) 919,362,792 (+9.33)
168,491,279 (+5.26)
+10000000000/10000: 907,214,638 (+1.54) 894,406,875 (+6.36)
165,743,207 (+3.54)
+10000000000/100000: 918,094,542 (+2.75) 910,211,234 (+8.24)
167,841,828 (+4.85)
+10000000000/1000000: 910,698,725 (+1.93) 885,385,460 (+5.29)
166,406,742 (+3.95)
There's some permutations on where we use jump labels that I have to finish
evaluating (including whether we want to skip the jump labels in the
!CC_HAS_ASM_GOTO case), as well as one or two other shavings that I am
looking at. Will post v7.2 incorporating these speed ups as well as some build
fixes for the !CONFIG_CGROUP case on Monday/Tuesday.
Thanks,
- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists