[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170928123758.robe5ggsjf4voj7h@hirez.programming.kicks-ass.net>
Date: Thu, 28 Sep 2017 14:37:58 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Rik van Riel <riel@...hat.com>
Cc: Eric Farman <farman@...ux.vnet.ibm.com>,
????????? <jinpuwang@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Christian Borntraeger <borntraeger@...ibm.com>,
"KVM-ML (kvm@...r.kernel.org)" <kvm@...r.kernel.org>,
vcaputo@...garu.com, Matthew Rosato <mjrosato@...ux.vnet.ibm.com>
Subject: Re: sysbench throughput degradation in 4.13+
On Wed, Sep 27, 2017 at 01:58:20PM -0400, Rik van Riel wrote:
> @@ -5359,10 +5378,14 @@ wake_affine_llc(struct sched_domain *sd, struct task_struct *p,
> unsigned long current_load = task_h_load(current);
>
> /* in this case load hits 0 and this LLC is considered 'idle' */
> - if (current_load > this_stats.load)
> + if (current_load > this_stats.max_load)
> + return true;
> +
> + /* allow if the CPU would go idle, regardless of LLC load */
> + if (current_load >= target_load(this_cpu, sd->wake_idx))
> return true;
>
> - this_stats.load -= current_load;
> + this_stats.max_load -= current_load;
> }
>
> /*
> @@ -5375,10 +5398,6 @@ wake_affine_llc(struct sched_domain *sd, struct task_struct *p,
> if (prev_stats.has_capacity && prev_stats.nr_running < this_stats.nr_running+1)
> return false;
>
> - /* if this cache has capacity, come here */
> - if (this_stats.has_capacity && this_stats.nr_running+1 < prev_stats.nr_running)
> - return true;
> -
> /*
> * Check to see if we can move the load without causing too much
> * imbalance.
> @@ -5391,8 +5410,8 @@ wake_affine_llc(struct sched_domain *sd, struct task_struct *p,
> prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
> prev_eff_load *= this_stats.capacity;
>
> - this_eff_load *= this_stats.load + task_load;
> - prev_eff_load *= prev_stats.load - task_load;
> + this_eff_load *= this_stats.max_load + task_load;
> + prev_eff_load *= prev_stats.min_load - task_load;
>
> return this_eff_load <= prev_eff_load;
> }
So I would really like a workload that needs this LLC/NUMA stuff.
Because I much prefer the simpler: 'on which of these two CPUs can I run
soonest' approach.
Powered by blists - more mailing lists