[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190718113758.GN3402@hirez.programming.kicks-ass.net>
Date: Thu, 18 Jul 2019 13:37:58 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Subhra Mazumdar <subhra.mazumdar@...cle.com>
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com, tglx@...utronix.de,
prakash.sangappa@...cle.com, dhaval.giani@...cle.com,
daniel.lezcano@...aro.org, vincent.guittot@...aro.org,
viresh.kumar@...aro.org, tim.c.chen@...ux.intel.com,
mgorman@...hsingularity.net, Paul Turner <pjt@...gle.com>
Subject: Re: [RFC PATCH 2/3] sched: change scheduler to give preference to
soft affinity CPUs
On Wed, Jul 17, 2019 at 08:31:25AM +0530, Subhra Mazumdar wrote:
>
> On 7/2/19 10:58 PM, Peter Zijlstra wrote:
> > On Wed, Jun 26, 2019 at 03:47:17PM -0700, subhra mazumdar wrote:
> > > The soft affinity CPUs present in the cpumask cpus_preferred is used by the
> > > scheduler in two levels of search. First is in determining wake affine
> > > which choses the LLC domain and secondly while searching for idle CPUs in
> > > LLC domain. In the first level it uses cpus_preferred to prune out the
> > > search space. In the second level it first searches the cpus_preferred and
> > > then cpus_allowed. Using affinity_unequal flag it breaks early to avoid
> > > any overhead in the scheduler fast path when soft affinity is not used.
> > > This only changes the wake up path of the scheduler, the idle balancing
> > > is unchanged; together they achieve the "softness" of scheduling.
> > I really dislike this implementation.
> >
> > I thought the idea was to remain work conserving (in so far as that
> > we're that anyway), so changing select_idle_sibling() doesn't make sense
> > to me. If there is idle, we use it.
> >
> > Same for newidle; which you already retained.
> The scheduler is already not work conserving in many ways. Soft affinity is
> only for those who want to use it and has no side effects when not used.
> Also the way scheduler is implemented in the first level of search it may
> not be possible to do it in a work conserving way, I am open to ideas.
I really don't understand the premise of this soft affinity stuff then.
I understood it was to allow spreading if under-utilized, but group when
over-utilized, but you're arguing for the exact opposite, which doesn't
make sense.
> > And I also really don't want a second utilization tipping point; we
> > already have the overloaded thing.
> The numbers in the cover letter show that a static tipping point will not
> work for all workloads. What soft affinity is doing is essentially trading
> off cache coherence for more CPU. The optimum tradeoff point will vary
> from workload to workload and the system metrics of coherence overhead etc.
> If we just use the domain overload that becomes a static definition of
> tipping point, we need something tunable that captures this tradeoff. The
> ratio of CPU util seemed to work well and capture that.
And then you run two workloads with different characteristics on the
same box.
Global knobs are buggered.
Powered by blists - more mailing lists