linux-kernel - Re: [patch v3.18+ regression fix] sched: Further improve spurious CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtDxTH62HGrze+rSrw9+kZc6xHSfJemhWqxhyhLZzM0qDg@mail.gmail.com>
Date:   Thu, 1 Sep 2016 10:09:22 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Mike Galbraith <umgwanakikbuti@...il.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Rik van Riel <riel@...hat.com>
Subject: Re: [patch v3.18+ regression fix] sched: Further improve spurious
 CPU_IDLE active migrations

On 1 September 2016 at 06:11, Mike Galbraith <umgwanakikbuti@...il.com> wrote:
> On Wed, 2016-08-31 at 17:52 +0200, Vincent Guittot wrote:
>> On 31 August 2016 at 12:36, Mike Galbraith <umgwanakikbuti@...il.com> wrote:
>> > On Wed, 2016-08-31 at 12:18 +0200, Mike Galbraith wrote:
>> > > On Wed, 2016-08-31 at 12:01 +0200, Peter Zijlstra wrote:
>> >
>> > > > So 43f4d66637bc ("sched: Improve sysbench performance by fixing spurious
>> > > > active migration") 's +1 made sense in that its a tie breaker. If you
>> > > > have 3 tasks on 2 groups, one group will have to have 2 tasks, and
>> > > > bouncing the one task around just isn't going to help _anything_.
>> > >
>> > > Yeah, but frequently tasks don't come in ones, so, you end up with an
>> > > endless tug of war between LB ripping communicating buddies apart, and
>> > > select_idle_sibling() pulling them back together.. bouncing cow
>> > > syndrome.
>> >
>>
>> replacing +1 by +2 fixes this use case that involves 2 threads but
>> similar behavior can happen with 3 tasks on system with 4 cores per MC
>> as an example
>>
>> IIUC, you have on
>> - one side, periodic load balance that spreads the 2 tasks in the system
>> - on the other side, wake up path that moves the task back in the same MC.
>
> Yup.
>
>> Isn't your regression more linked to spurious migration than where the
>> task is scheduled ? I don't see any direct relation between the client
>> and the server in this netperf test, isn't it ?
>
>          netperf  4360 [004]  1207.865265:       sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
>          netperf  4360 [004]  1207.865274:       sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
>          netperf  4360 [004]  1207.865280:       sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
>        netserver  4361 [002]  1207.865313:       sched:sched_wakeup: netperf:4360 [120] success=1 CPU:004
>          netperf  4360 [004]  1207.865340:       sched:sched_wakeup: kworker/u16:4:89 [120] success=1 CPU:000
>          netperf  4360 [004]  1207.865345:       sched:sched_wakeup: kworker/u16:5:90 [120] success=1 CPU:006
>          netperf  4360 [004]  1207.865355:       sched:sched_wakeup: kworker/u16:5:90 [120] success=1 CPU:006
>          netperf  4360 [004]  1207.865357:       sched:sched_wakeup: kworker/u16:4:89 [120] success=1 CPU:000
>          netperf  4360 [004]  1207.865369:       sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
>        netserver  4361 [002]  1207.865377:       sched:sched_wakeup: netperf:4360 [120] success=1 CPU:004
>          netperf  4360 [004]  1207.865476:       sched:sched_wakeup: perf:4359 [120] success=1 CPU:003

I would have expected a net_rx softirq in the middle.
Nevermind, i agree that we can find lot of use cases with communicating tasks

>
> It's not limited to this load, anything at all that is communicating
> will do the same on these or similar processors.
>
> This trying to be perfect looks like a booboo to me, as we are now
> specifically asking our left hand undo what our right hand did to crank
> up throughput.  For the diagnosed processor at least, one of those
> hands definitely wants to be slapped.
>
> This doesn't seem to be an issue for L3 equipped CPUs, but perhaps is
> for some even modern processors, dunno (the boxen where regression was
> detected are far from new).
>
>> we could either remove the condition which tries to keep an even
>> number of tasks in each group until busiest group becomes overloaded
>> but it means that unrelated tasks may have to share same resources
>> or we could try to prevent the migration at wake up. I was looking at
>> wake_affine which seems to choose local cpu  when both prev and local
>> cpu are idle. I wonder if local cpu is  really a better choice when
>> both are idle
>
> I don't see a great alternative to turning it off off the top of my
> head, at least for processors with multiple LLCs.  Yeah, unrelated
> tasks could end up sharing a cache needlessly, but will that hurt as
> badly as tasks not munching tasty hot data definitely does?

memory intensive task will probably be hurt

>
>         -Mike