linux-kernel - Re: [lkp-robot] [sched/fair] 6d46bd3d97: netperf.Throughput

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1505666869.15333.110.camel@gmx.de>
Date:   Sun, 17 Sep 2017 18:47:49 +0200
From:   Mike Galbraith <efault@....de>
To:     Joel Fernandes <joelaf@...gle.com>, Rik van Riel <riel@...hat.com>
Cc:     kernel test robot <xiaolong.ye@...el.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Josef Bacik <jbacik@...com>, Juri Lelli <Juri.Lelli@....com>,
        Brendan Jackman <brendan.jackman@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Ingo Molnar <mingo@...hat.com>, lkp@...org
Subject: Re: [lkp-robot] [sched/fair] 6d46bd3d97: netperf.Throughput_tps
 -11.3% regression

On Sat, 2017-09-16 at 23:42 -0700, Joel Fernandes wrote:
> 
> Yes I understand. However with my 'strong sync' patch, such a
> balancing check could be useful which is what I was trying to do in a
> different way in my patch - but it could be that my way is not good
> enough and potentially the old wake_affine check could help here

Help how?  The old wake_affine() check contained zero concurrency
information, it served to exclude excessive stacking, defeating the
purpose of SMP.  A truly synchronous wakeup has absolutely nothing to
do with load balance in the general case: you can neither generate nor
cure an imbalance by replacing one (nice zero) task with another.  The
mere existence of a load based braking mechanism speaks volumes.

> On systems with SMT, it may make more sense for
> > sync wakeups to look for idle threads of the same
> > core, than to have the woken task end up on the
> > same thread, and wait for the current task to stop
> > running.
> 
> I am ok with additionally doing an select_idle_smt for the SMT cases.
> However Mike shows that it doesn't necessarily cause a performance
> improvement. But if there is consensus on checking for idle SMT
> threads, then I'm Ok with doing that.

select_idle_sibling() used to check thread first, that was changed to
core first for performance reasons.

> > "Strong sync" wakeups like you propose would also
> > change the semantics of wake_wide() and potentially
> > other bits of code...
> >
> 
> I understand, I am not very confident that wake_wide does the right
> thing anyway. Atleast for Android, wake_wide doesn't seem to mirror
> the most common usecase of display pipeline well. It seems that we
> have cases where the 'flip count' is really high and causes wake_wide
> all the time and sends us straight to the wake up slow path causing
> regressions in Android benchmarks.

Hm.  It didn't pull those counts out of the vacuum, it measured them.
 It definitely does not force Android into the full balance path, that
is being done by Android developers, as SD_BALANCE_WAKE is off by
default.  It was briefly on by default, but was quickly turned back off
because it... induced performance regressions.

In any case, if you have cause to believe that wake_wide() is causing
you grief, why the heck are you bending up the sync hint?

> Atleast with the sync flag, the caller provides a meaningful
> indication and I think making that flag stronger / more preferred than
> wake_wide makes sense from that perspective since its not a signal
> that's guessed, but is rather an input request.

Sort of, if you disregard the history I mentioned...

https://www.youtube.com/watch?v=Yho1Eydh1mM :)

Lacking solid concurrency information to base your decision on, you'll
end up robbing Peter to pay Paul forever, you absolutely will stack
non-synchronous tasks, inducing needless latency hits and injuring
scalability.  We've been down that road.  $subject was a very small
sample of what lies down this path <bang bang bang>.

	-Mike