linux-kernel - Re: [PATCH RFC] sched/fair: let cpu's cfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 4 Apr 2016 10:01:08 +0100
From:	Morten Rasmussen <morten.rasmussen@....com>
To:	Leo Yan <leo.yan@...aro.org>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Steve Muckle <steve.muckle@...aro.org>,
	linux-kernel@...r.kernel.org, eas-dev@...ts.linaro.org
Subject: Re: [PATCH RFC] sched/fair: let cpu's cfs_rq to reflect task
 migration

On Sat, Apr 02, 2016 at 12:38:37AM +0800, Leo Yan wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0fe30e6..10ca1a9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2825,12 +2825,24 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
>  static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
>  {
>  	struct sched_avg *sa = &cfs_rq->avg;
> +	struct sched_avg *cpu_sa = NULL;
>  	int decayed, removed = 0;
> +	int cpu = cpu_of(rq_of(cfs_rq));
> +
> +	if (&cpu_rq(cpu)->cfs != cfs_rq)
> +		cpu_sa = &cpu_rq(cpu)->cfs.avg;
>  
>  	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
>  		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
>  		sa->load_avg = max_t(long, sa->load_avg - r, 0);
>  		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
> +
> +		if (cpu_sa) {
> +			cpu_sa->load_avg = max_t(long, cpu_sa->load_avg - r, 0);
> +			cpu_sa->load_sum = max_t(s64,
> +					cpu_sa->load_sum - r * LOAD_AVG_MAX, 0);

I think this is not quite right. 'r' is the load contribution removed
from a group cfs_rq. Unless the task is the only task in the group and
the group shares are default, this value is different from the load
contribution of the task at root level (task_h_load()).

And how about nested groups? I think you may end up removing the
contribution of a task multiple times from the root cfs_rq if it is in a
nested group.

You will need to either redesign group scheduling so you get the load to
trickle down automatically and with the right contribution, or maintain
a new sum at the root bypassing the existing design. I'm not sure if the
latter makes sense for load though. It would make more sense for
utilization only.

> @@ -2919,6 +2939,13 @@ skip_aging:
>  	cfs_rq->avg.load_sum += se->avg.load_sum;
>  	cfs_rq->avg.util_avg += se->avg.util_avg;
>  	cfs_rq->avg.util_sum += se->avg.util_sum;
> +
> +	if (&cpu_rq(cpu)->cfs != cfs_rq) {
> +		cpu_rq(cpu)->cfs.avg.load_avg += se->avg.load_avg;
> +		cpu_rq(cpu)->cfs.avg.load_sum += se->avg.load_sum;
> +		cpu_rq(cpu)->cfs.avg.util_avg += se->avg.util_avg;
> +		cpu_rq(cpu)->cfs.avg.util_sum += se->avg.util_sum;
> +	}

Same problem as above. You should be adding the right amount of task
contribution here. Something similar to task_h_load(). The problem with
using task_h_load() is that it is not updated immediately either, so
maintaining a stable signal based on that might be difficult.