[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ea04602-041a-5b90-eba9-c20c7e98c92e@oracle.com>
Date: Wed, 2 May 2018 14:58:42 -0700
From: Subhra Mazumdar <subhra.mazumdar@...cle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com,
daniel.lezcano@...aro.org, steven.sistare@...cle.com,
dhaval.giani@...cle.com, rohit.k.jain@...cle.com
Subject: Re: [PATCH 1/3] sched: remove select_idle_core() for scalability
On 05/01/2018 11:03 AM, Peter Zijlstra wrote:
> On Mon, Apr 30, 2018 at 04:38:42PM -0700, Subhra Mazumdar wrote:
>> I also noticed a possible bug later in the merge code. Shouldn't it be:
>>
>> if (busy < best_busy) {
>> best_busy = busy;
>> best_cpu = first_idle;
>> }
> Uhh, quite. I did say it was completely untested, but yes.. /me dons the
> brown paper bag.
I re-ran the test after fixing that bug but still get similar regressions
for hackbench, while similar improvements on Uperf. I didn't re-run the
Oracle DB tests but my guess is it will show similar improvement.
merge:
Hackbench process on 2 socket, 44 core and 88 threads Intel x86 machine
(lower is better):
groups baseline %stdev patch %stdev
1 0.5742 21.13 0.5131 (10.64%) 4.11
2 0.5776 7.87 0.5387 (6.73%) 2.39
4 0.9578 1.12 1.0549 (-10.14%) 0.85
8 1.7018 1.35 1.8516 (-8.8%) 1.56
16 2.9955 1.36 3.2466 (-8.38%) 0.42
32 5.4354 0.59 5.7738 (-6.23%) 0.38
Uperf pingpong on 2 socket, 44 core and 88 threads Intel x86 machine with
message size = 8k (higher is better):
threads baseline %stdev patch %stdev
8 49.47 0.35 51.1 (3.29%) 0.13
16 95.28 0.77 98.45 (3.33%) 0.61
32 156.77 1.17 170.97 (9.06%) 5.62
48 193.24 0.22 245.89 (27.25%) 7.26
64 216.21 9.33 316.43 (46.35%) 0.37
128 379.62 10.29 337.85 (-11%) 3.68
I tried using the next_cpu technique with the merge but didn't help. I am
open to suggestions.
merge + next_cpu:
Hackbench process on 2 socket, 44 core and 88 threads Intel x86 machine
(lower is better):
groups baseline %stdev patch %stdev
1 0.5742 21.13 0.5107 (11.06%) 6.35
2 0.5776 7.87 0.5917 (-2.44%) 11.16
4 0.9578 1.12 1.0761 (-12.35%) 1.1
8 1.7018 1.35 1.8748 (-10.17%) 0.8
16 2.9955 1.36 3.2419 (-8.23%) 0.43
32 5.4354 0.59 5.6958 (-4.79%) 0.58
Uperf pingpong on 2 socket, 44 core and 88 threads Intel x86 machine with
message size = 8k (higher is better):
threads baseline %stdev patch %stdev
8 49.47 0.35 51.65 (4.41%) 0.26
16 95.28 0.77 99.8 (4.75%) 1.1
32 156.77 1.17 168.37 (7.4%) 0.6
48 193.24 0.22 228.8 (18.4%) 1.75
64 216.21 9.33 287.11 (32.79%) 10.82
128 379.62 10.29 346.22 (-8.8%) 4.7
Finally there was earlier suggestion by Peter in select_task_rq_fair to
transpose the cpu offset that I had tried earlier but also regressed on
hackbench. Just wanted to mention that so we have closure on that.
transpose cpu offset in select_task_rq_fair:
Hackbench process on 2 socket, 44 core and 88 threads Intel x86 machine
(lower is better):
groups baseline %stdev patch %stdev
1 0.5742 21.13 0.5251 (8.55%) 2.57
2 0.5776 7.87 0.5471 (5.28%) 11
4 0.9578 1.12 1.0148 (-5.95%) 1.97
8 1.7018 1.35 1.798 (-5.65%) 0.97
16 2.9955 1.36 3.088 (-3.09%) 2.7
32 5.4354 0.59 5.2815 (2.8%) 1.26
Powered by blists - more mailing lists