linux-kernel - Re: [PATCH 5/5] sched: limit sched_slice if it is more than sysctl_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130401050926.GB12015@lge.com>
Date:	Mon, 1 Apr 2013 14:09:26 +0900
From:	Joonsoo Kim <iamjoonsoo.kim@....com>
To:	Preeti U Murthy <preeti@...ux.vnet.ibm.com>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, Mike Galbraith <efault@....de>,
	Paul Turner <pjt@...gle.com>, Alex Shi <alex.shi@...el.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Morten Rasmussen <morten.rasmussen@....com>,
	Namhyung Kim <namhyung@...nel.org>
Subject: Re: [PATCH 5/5] sched: limit sched_slice if it is more than
 sysctl_sched_latency

Hello Preeti.

On Fri, Mar 29, 2013 at 05:05:37PM +0530, Preeti U Murthy wrote:
> Hi Joonsoo
> 
> On 03/28/2013 01:28 PM, Joonsoo Kim wrote:
> > sched_slice() compute ideal runtime slice. If there are many tasks
> > in cfs_rq, period for this cfs_rq is extended to guarantee that each task
> > has time slice at least, sched_min_granularity. And then each task get
> > a portion of this period for it. If there is a task which have much larger
> > load weight than others, a portion of period can exceed far more than
> > sysctl_sched_latency.
> 
> Correct. But that does not matter, the length of the scheduling latency
> period is determined by the return value of ___sched_period(), not the
> value of sysctl_sched_latency. You would not extend the period,if you
> wanted all tasks to have a slice within the sysctl_sched_latency, right?
> 
> So since the value of the length of the scheduling latency period, is
> dynamic depending on the number of the processes running, the
> sysctl_sched_latency which is the default latency period length is not
> mesed with, but is only used as a base to determine the actual
> scheduling period.
> 
> > 
> > For exampple, you can simply imagine that one task with nice -20 and
> > 9 tasks with nice 0 on one cfs_rq. In this case, load weight sum for
> > this cfs_rq is 88761 + 9 * 1024, 97977. So a portion of slice for the
> > task with nice -20 is sysctl_sched_min_granularity * 10 * (88761 / 97977),
> > that is, approximately, sysctl_sched_min_granularity * 9. This aspect
> > can be much larger if there is more tasks with nice 0.
> 
> Yeah so the __sched_period says that within 40ms, all tasks need to be
> scheduled ateast once, and the highest priority task gets nearly 36ms of
> it, while the rest is distributed among the others.
> 
> > 
> > So we should limit this possible weird situation.
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index e232421..6ceffbc 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -645,6 +645,9 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >  	}
> >  	slice = calc_delta_mine(slice, se->load.weight, load);
> > 
> > +	if (unlikely(slice > sysctl_sched_latency))
> > +		slice = sysctl_sched_latency;
> 
> Then in this case the highest priority thread would get
> 20ms(sysctl_sched_latency), and the rest would get
> sysctl_sched_min_granularity * 10 * (1024/97977) which would be 0.4ms.
> Then all tasks would get scheduled ateast once within 20ms + (0.4*9) ms
> = 23.7ms, while your scheduling latency period was extended to 40ms,just
> so that each of these tasks don't have their sched_slices shrunk due to
> large number of tasks.

I don't know I understand your question correctly.
I will do my best to answer your comment. :)

With this patch, I just limit maximum slice at one time. Scheduling is
controlled through the vruntime. So, in this case, the task with nice -20
will be scheduled twice.

20 + (0.4 * 9) + 20 = 43.9 ms

And after 43.9 ms, this process is repeated.

So I can tell you that scheduling period is preserved as before.

If we give a long period to a task at one go, it can cause
a latency problem. So IMHO, limiting this is meaningful.

Thanks.

> 
> > +
> >  	return slice;
> >  }
> > 
> 
> Regards
> Preeti U Murthy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/