[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDxTH62HGrze+rSrw9+kZc6xHSfJemhWqxhyhLZzM0qDg@mail.gmail.com>
Date: Thu, 1 Sep 2016 10:09:22 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Mike Galbraith <umgwanakikbuti@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
Rik van Riel <riel@...hat.com>
Subject: Re: [patch v3.18+ regression fix] sched: Further improve spurious
CPU_IDLE active migrations
On 1 September 2016 at 06:11, Mike Galbraith <umgwanakikbuti@...il.com> wrote:
> On Wed, 2016-08-31 at 17:52 +0200, Vincent Guittot wrote:
>> On 31 August 2016 at 12:36, Mike Galbraith <umgwanakikbuti@...il.com> wrote:
>> > On Wed, 2016-08-31 at 12:18 +0200, Mike Galbraith wrote:
>> > > On Wed, 2016-08-31 at 12:01 +0200, Peter Zijlstra wrote:
>> >
>> > > > So 43f4d66637bc ("sched: Improve sysbench performance by fixing spurious
>> > > > active migration") 's +1 made sense in that its a tie breaker. If you
>> > > > have 3 tasks on 2 groups, one group will have to have 2 tasks, and
>> > > > bouncing the one task around just isn't going to help _anything_.
>> > >
>> > > Yeah, but frequently tasks don't come in ones, so, you end up with an
>> > > endless tug of war between LB ripping communicating buddies apart, and
>> > > select_idle_sibling() pulling them back together.. bouncing cow
>> > > syndrome.
>> >
>>
>> replacing +1 by +2 fixes this use case that involves 2 threads but
>> similar behavior can happen with 3 tasks on system with 4 cores per MC
>> as an example
>>
>> IIUC, you have on
>> - one side, periodic load balance that spreads the 2 tasks in the system
>> - on the other side, wake up path that moves the task back in the same MC.
>
> Yup.
>
>> Isn't your regression more linked to spurious migration than where the
>> task is scheduled ? I don't see any direct relation between the client
>> and the server in this netperf test, isn't it ?
>
> netperf 4360 [004] 1207.865265: sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
> netperf 4360 [004] 1207.865274: sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
> netperf 4360 [004] 1207.865280: sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
> netserver 4361 [002] 1207.865313: sched:sched_wakeup: netperf:4360 [120] success=1 CPU:004
> netperf 4360 [004] 1207.865340: sched:sched_wakeup: kworker/u16:4:89 [120] success=1 CPU:000
> netperf 4360 [004] 1207.865345: sched:sched_wakeup: kworker/u16:5:90 [120] success=1 CPU:006
> netperf 4360 [004] 1207.865355: sched:sched_wakeup: kworker/u16:5:90 [120] success=1 CPU:006
> netperf 4360 [004] 1207.865357: sched:sched_wakeup: kworker/u16:4:89 [120] success=1 CPU:000
> netperf 4360 [004] 1207.865369: sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
> netserver 4361 [002] 1207.865377: sched:sched_wakeup: netperf:4360 [120] success=1 CPU:004
> netperf 4360 [004] 1207.865476: sched:sched_wakeup: perf:4359 [120] success=1 CPU:003
I would have expected a net_rx softirq in the middle.
Nevermind, i agree that we can find lot of use cases with communicating tasks
>
> It's not limited to this load, anything at all that is communicating
> will do the same on these or similar processors.
>
> This trying to be perfect looks like a booboo to me, as we are now
> specifically asking our left hand undo what our right hand did to crank
> up throughput. For the diagnosed processor at least, one of those
> hands definitely wants to be slapped.
>
> This doesn't seem to be an issue for L3 equipped CPUs, but perhaps is
> for some even modern processors, dunno (the boxen where regression was
> detected are far from new).
>
>> we could either remove the condition which tries to keep an even
>> number of tasks in each group until busiest group becomes overloaded
>> but it means that unrelated tasks may have to share same resources
>> or we could try to prevent the migration at wake up. I was looking at
>> wake_affine which seems to choose local cpu when both prev and local
>> cpu are idle. I wonder if local cpu is really a better choice when
>> both are idle
>
> I don't see a great alternative to turning it off off the top of my
> head, at least for processors with multiple LLCs. Yeah, unrelated
> tasks could end up sharing a cache needlessly, but will that hurt as
> badly as tasks not munching tasty hot data definitely does?
memory intensive task will probably be hurt
>
> -Mike
Powered by blists - more mailing lists