linux-kernel - Re: [PATCH 5/5] sched: limit sched_slice if it is more than sysctl_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <51557C89.4070201@linux.vnet.ibm.com>
Date:	Fri, 29 Mar 2013 17:05:37 +0530
From:	Preeti U Murthy <preeti@...ux.vnet.ibm.com>
To:	Joonsoo Kim <iamjoonsoo.kim@....com>
CC:	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, Mike Galbraith <efault@....de>,
	Paul Turner <pjt@...gle.com>, Alex Shi <alex.shi@...el.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Morten Rasmussen <morten.rasmussen@....com>,
	Namhyung Kim <namhyung@...nel.org>
Subject: Re: [PATCH 5/5] sched: limit sched_slice if it is more than sysctl_sched_latency

Hi Joonsoo

On 03/28/2013 01:28 PM, Joonsoo Kim wrote:
> sched_slice() compute ideal runtime slice. If there are many tasks
> in cfs_rq, period for this cfs_rq is extended to guarantee that each task
> has time slice at least, sched_min_granularity. And then each task get
> a portion of this period for it. If there is a task which have much larger
> load weight than others, a portion of period can exceed far more than
> sysctl_sched_latency.

Correct. But that does not matter, the length of the scheduling latency
period is determined by the return value of ___sched_period(), not the
value of sysctl_sched_latency. You would not extend the period,if you
wanted all tasks to have a slice within the sysctl_sched_latency, right?

So since the value of the length of the scheduling latency period, is
dynamic depending on the number of the processes running, the
sysctl_sched_latency which is the default latency period length is not
mesed with, but is only used as a base to determine the actual
scheduling period.

> 
> For exampple, you can simply imagine that one task with nice -20 and
> 9 tasks with nice 0 on one cfs_rq. In this case, load weight sum for
> this cfs_rq is 88761 + 9 * 1024, 97977. So a portion of slice for the
> task with nice -20 is sysctl_sched_min_granularity * 10 * (88761 / 97977),
> that is, approximately, sysctl_sched_min_granularity * 9. This aspect
> can be much larger if there is more tasks with nice 0.

Yeah so the __sched_period says that within 40ms, all tasks need to be
scheduled ateast once, and the highest priority task gets nearly 36ms of
it, while the rest is distributed among the others.

> 
> So we should limit this possible weird situation.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e232421..6ceffbc 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -645,6 +645,9 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  	}
>  	slice = calc_delta_mine(slice, se->load.weight, load);
> 
> +	if (unlikely(slice > sysctl_sched_latency))
> +		slice = sysctl_sched_latency;

Then in this case the highest priority thread would get
20ms(sysctl_sched_latency), and the rest would get
sysctl_sched_min_granularity * 10 * (1024/97977) which would be 0.4ms.
Then all tasks would get scheduled ateast once within 20ms + (0.4*9) ms
= 23.7ms, while your scheduling latency period was extended to 40ms,just
so that each of these tasks don't have their sched_slices shrunk due to
large number of tasks.

> +
>  	return slice;
>  }
> 

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/