linux-kernel - Re: wake_wide mechanism clarification

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170731164227.GA7922@li70-116.members.linode.com>
Date:   Mon, 31 Jul 2017 16:42:28 +0000
From:   Josef Bacik <josef@...icpanda.com>
To:     Joel Fernandes <joelaf@...gle.com>
Cc:     Josef Bacik <josef@...icpanda.com>,
        Mike Galbraith <umgwanakikbuti@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Juri Lelli <Juri.Lelli@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Patrick Bellasi <patrick.bellasi@....com>,
        Brendan Jackman <brendan.jackman@....com>,
        Chris Redpath <Chris.Redpath@....com>,
        Michael Wang <wangyun@...ux.vnet.ibm.com>,
        Matt Fleming <matt@...eblueprint.co.uk>
Subject: Re: wake_wide mechanism clarification

On Mon, Jul 31, 2017 at 09:21:46AM -0700, Joel Fernandes wrote:
> Hi Josef,
> 
> On Mon, Jul 31, 2017 at 5:21 AM, Josef Bacik <josef@...icpanda.com> wrote:
> > On Sat, Jul 29, 2017 at 03:41:56PM -0700, Joel Fernandes wrote:
> >> On Sat, Jul 29, 2017 at 3:28 PM, Joel Fernandes <joelaf@...gle.com> wrote:
> >> <snip>
> >> >>>> Again I didn't follow why the second condition couldn't just be:
> >> >>>> waker->nr_wakee_switch > factor, or, (waker->nr_wakee_switch +
> >> >>>> wakee->nr_wakee_switch) > factor, based on the above explanation from
> >> >>>> Micheal Wang that I quoted.
> >> >>>> and why he's instead doing the whole multiplication thing there that I
> >> >>>> was talking about earlier: "factor * wakee->nr_wakee_switch".
> >> >>>>
> >> >>>> Rephrasing my question in another way, why are we talking the ratio of
> >> >>>> master/slave instead of the sum when comparing if its > factor? I am
> >> >>>> surely missing something here.
> >> >>>
> >> >>> Because the heuristic tries to not demolish 1:1 buddies.  Big partner
> >> >>> flip delta means the pair are unlikely to be a communicating pair,
> >> >>> perhaps at high frequency where misses hurt like hell.
> >> >>
> >> >> But it does seem to me to demolish the N:N communicating pairs from a
> >> >> latency/load balancing standpoint. For he case of N readers and N
> >> >> writers, the ratio (master/slave) comes down to 1:1 and we wake
> >> >> affine. Hopefully I didn't miss something too obvious about that.
> >> >
> >> > I think wake_affine() should correctly handle the case (of
> >> > overloading) I bring up here where wake_wide() is too conservative and
> >> > does affine a lot, (I don't have any data for this though, this just
> >> > from code reading), so I take this comment back for this reason.
> >>
> >> aargh, nope :( it still runs select_idle_sibling although on the
> >> previous CPU even if want_affine is 0 (and doesn't do the wider
> >> wakeup..), so the comment still applies.. its easy to get lost into
> >> the code with so many if statements :-\  sorry about the noise :)
> >>
> >
> > I've been working in this area recently because of a cpu imbalance problem.
> > Wake_wide() definitely makes it so we're waking affine way too often, but I
> > think messing with wake_waide to solve that problem is the wrong solution.  This
> > is just a heuristic to see if we should wake affine, the simpler the better.  I
> > solved the problem of waking affine too often like this
> >
> > https://marc.info/?l=linux-kernel&m=150003849602535&w=2
> 
> Thanks! Cool!
> 
> >
> > So why do you care about wake_wide() anyway?  Are you observing some problem
> > that you suspect is affected by the affine wakeup stuff?  Or are you just trying
> 
> I am dealing with an affine wake up issue, yes.
> 
> > to understand what is going on for fun?  Cause if you are just doing this for
> > fun you are a very strange person, thanks,
> 
> Its not just for fun :) Let me give you some background about me, I
> work in the Android team and one of the things I want to do is to take
> an out of tree patch that's been carried for some time and post a more
> upstreamable solution - this is related to wake ups from the binder
> driver which does sync wake ups (WF_SYNC). I can't find the exact out
> of tree patch publicly since it wasn't posted to a list, but the code
> is here [1]. What's worse is I have recently found really bad issues
> with this patch itself where runnable times are increased. I should
> have provided this background earlier (sorry that I didn't, my plan
> was to trigger a separate discussion about the binder sync wake up
> thing as a part of a patch/proposal I want to post - which I plan to
> do so). Anyway, as a part of this effort, I want to understand
> wake_wide() better and "respect" it since it sits in the wake up path
> and I wanted to my proposal to work well with it, especially since I
> want to solve this problem in an upstream-friendly way.
> 
> The other reason to trigger the discussion, is, I have seen
> wake_wide() enough number of times and asked enough number of folks
> how it works that it seems sensible to ask about it here (I was also
> suggested to ask about wake_wide on LKML because since few people
> seemingly understand how it works) and hopefully now its a bit better
> understood.
> 
> I agree with you that instead of spending insane amounts of time on
> wake_wide itself, its better to directly work on a problem and collect
> some data - which is also what I'm doing, but I still thought its
> worth doing some digging into wake_wide() during some free time I had,
> thanks.
> 

Ok so your usecase is to _always_ wake affine if we're doing a sync wakeup.  I
_think_ for your case it's best to make wake_affine() make this decision, and
you don't want wake_wide() to filter out your wakeup as not-affine?  So perhaps
just throw a test in there to not wake wide if WF_SYNC is set.  This makes
logical sense to me as synchronous wakeups are probably going to want to be
affine wakeups, and then we can rely on wake_affine() to do the load checks to
make sure it really makes sense.  How does that sound?  Thanks,

Josef