[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130125120256.52733b51@redhat.com>
Date:	Fri, 25 Jan 2013 12:02:56 -0600
From:	Clark Williams <williams@...hat.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>,
	LKML <linux-kernel@...r.kernel.org>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH v2] sched: add a tuning knob to allow changing RR
 tmeslice
On Fri, 25 Jan 2013 09:19:16 +0100
Ingo Molnar <mingo@...nel.org> wrote:
> 
> * Clark Williams <williams@...hat.com> wrote:
> 
> > On Thu, 24 Jan 2013 17:59:55 +0100
> > Ingo Molnar <mingo@...nel.org> wrote:
> > 
> > > 
> > > * Clark Williams <williams@...hat.com> wrote:
> > > 
> > > > This version stores the user-input value in a separate 
> > > > location from the jiffies values used by the scheduler, to 
> > > > prevent a race condition.
> > > > 
> > > > Subject: [PATCH v2] sched: add a tuning knob to allow changing 
> > > > RR timeslice
> > > 
> > > looks useful.
> > > 
> > > > @@ -2010,7 +2010,7 @@ static void task_tick_rt(struct rq *rq, struct
> > > > task_struct *p, int queued) if (--p->rt.time_slice)
> > > >  		return;
> > > >  
> > > > -	p->rt.time_slice = RR_TIMESLICE;
> > > > +	p->rt.time_slice = sched_rr_timeslice;
> > > >  
> > > >  	/*
> > > >  	 * Requeue to the end of queue if we (and all of our
> > > > ancestors) are the @@ -2041,7 +2041,7 @@ static unsigned int
> > > > get_rr_interval_rt(struct rq *rq, struct task_struct *task)
> > > >  	 * Time slice is 0 for SCHED_FIFO tasks
> > > 
> > > Patch wont apply due to patch corruption, alas.
> > > 
> > > 
> > 
> > Easily fixed. Modified for 3.8-rc4:
> 
> Thanks. Some more substantial review feedback this time around:
> 
> > 
> > commit 0e2d40c5c84d06670f85cc212591f27f69f59c62
> > Author: Clark Williams <williams@...hat.com>
> > Date:   Thu Jan 24 13:51:01 2013 -0600
> > 
> >     [kernel] sched: add a tuning knob to allow changing RR timeslice
> >     
> >     Add a /proc/sys/kernel scheduler knob named sched_rr_timeslice_ms
> 
> s/Add a /proc/sys/kernel/sched_rr_timeslice_ms scheduler knob
> 
> >     that allows global changing of the SCHED_RR timeslice value. User
> >     visable value is in milliseconds but is stored as jiffies.  Setting
> 
> s/visible
> 
> >     to 0 (zero) resets to the default (currently 100ms).
> >     
> >     Signed-off-by: Clark Williams <williams@...hat.com>
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 6fc8f45..d803690 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -2082,6 +2082,11 @@ int sched_rt_handler(struct ctl_table *table, int write,
> >  		void __user *buffer, size_t *lenp,
> >  		loff_t *ppos);
> >  
> > +extern int sched_rr_timeslice;
> > +extern int sched_rr_handler(struct ctl_table *table, int write,
> > +		void __user *buffer, size_t *lenp,
> > +		loff_t *ppos);
> > +
> 
> Shouldn't this be in kernel/sched/sched.h instead of the 
> (already too large) linux/sched.h?
> 
I don't think that will work be cause kernel/sysctl.c needs the
externs and I doubt we want to include kernel/sched/sched.h from there. 
> >  #ifdef CONFIG_SCHED_AUTOGROUP
> >  extern unsigned int sysctl_sched_autogroup_enabled;
> >  
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 257002c..5675074 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7507,6 +7507,24 @@ static int sched_rt_global_constraints(void)
> >  }
> >  #endif /* CONFIG_RT_GROUP_SCHED */
> >  
> > +int sched_rr_handler(struct ctl_table *table, int write,
> > +		void __user *buffer, size_t *lenp,
> > +		loff_t *ppos)
> > +{
> > +	int ret;
> > +	static DEFINE_MUTEX(mutex);
> 
> This mutex should be outside the function (not in local scope), 
> and named like this, with an explanation:
> 
> +/*
> + * Since it's an int the CPU will always read a full word
> + * of the RR timeslice interval - no need for locking.
> + *
> + * But in the RR handler we read the value multiple times
> + * before setting it, which should be protected - hence
> + * this mutex:
> + */
> +static DEFINE_MUTEX(rr_timeslice_mutex);
> 
Done.
> 
> > +
> > +	mutex_lock(&mutex);
> > +	ret = proc_dointvec(table, write, buffer, lenp, ppos);
> > +	/* make sure that internally we keep jiffies */
> > +	/* also, writing zero resets timeslice to default */
> > +	if (!ret && write) 
> > +		sched_rr_timeslice = sched_rr_timeslice <= 0 ? 
> > +			RR_TIMESLICE : msecs_to_jiffies(sched_rr_timeslice);
> > +	mutex_unlock(&mutex);
> > +	return ret;
> 
> A couple of stray spaces at end of lines. Also, please put curly 
> braces around multi-line statements.
> 
I didn't put curly braces there because "technically" that's a single
line statement. But since in fact it is multi-line, I've added them :).
> Multi-line comments should be like this:
> 
>   /*
>    * Comment .....
>    * ...... goes here.
>    */
> 
Done.
> 
> > +}
> > +
> >  int sched_rt_handler(struct ctl_table *table, int write,
> >  		void __user *buffer, size_t *lenp,
> >  		loff_t *ppos)
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 418feb0..6c54e83 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -11,6 +11,8 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
> >  
> >  struct rt_bandwidth def_rt_bandwidth;
> >  
> > +int sched_rr_timeslice = RR_TIMESLICE;
> > +
> 
> I think this could go into core.c as well, together with the 
> mutex. That way it's easier to see why the mutex is needed as 
> well.
> 
> It should also be __read_mostly.
Done and done. 
> 
> >  static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer)
> >  {
> >  	struct rt_bandwidth *rt_b =
> > @@ -2010,7 +2012,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
> >  	if (--p->rt.time_slice)
> >  		return;
> >  
> > -	p->rt.time_slice = RR_TIMESLICE;
> > +	p->rt.time_slice = sched_rr_timeslice;
> >  
> >  	/*
> >  	 * Requeue to the end of queue if we (and all of our ancestors) are the
> > @@ -2041,7 +2043,7 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
> >  	 * Time slice is 0 for SCHED_FIFO tasks
> >  	 */
> >  	if (task->policy == SCHED_RR)
> > -		return RR_TIMESLICE;
> > +		return sched_rr_timeslice;
> >  	else
> >  		return 0;
> >  }
> > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > index c88878d..1eabf86 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -403,6 +403,14 @@ static struct ctl_table kern_table[] = {
> >  		.mode		= 0644,
> >  		.proc_handler	= sched_rt_handler,
> >  	},
> > +	{
> > +		.procname	= "sched_rr_timeslice_ms",
> > +		.data		= &sched_rr_timeslice,
> > +		.maxlen		= sizeof(int),
> > +		.mode		= 0644,
> > +		.proc_handler	= sched_rr_handler,
> 
> Does this allow negative values? Wouldn't it be better to make 
> it unsigned int all around?
Good point. Changed to unsigned int. 
Updated patch:
commit 93dfbf6326cc4ba85c917f9440203f9fc19e9bcc
Author: Clark Williams <williams@...hat.com>
Date:   Thu Jan 24 13:51:01 2013 -0600
    [kernel] sched: add a tuning knob to allow changing RR timeslice
    
    Add a /proc/sys/kernel/sched_rr_timeslice_ms tuning knob
    that allows global changing of the SCHED_RR timeslice value. User
    visible value is in milliseconds but is stored as jiffies.  Setting
    to 0 (zero) resets to the default (currently 100ms).
    
    Signed-off-by: Clark Williams <williams@...hat.com>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6fc8f45..5d0a7cf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1226,6 +1226,11 @@ struct sched_rt_entity {
  */
 #define RR_TIMESLICE		(100 * HZ / 1000)
 
+extern __read_mostly unsigned int sched_rr_timeslice;
+
+extern int sched_rr_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp,
+		loff_t *ppos);
 struct rcu_node;
 
 enum perf_event_task_context {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 257002c..0f7e6a2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7507,6 +7507,43 @@ static int sched_rt_global_constraints(void)
 }
 #endif /* CONFIG_RT_GROUP_SCHED */
 
+/*
+ * Since it's an int the CPU will always read a full word
+ * of the RR timeslice interval - no need for locking.
+ *
+ * But in the RR handler we read the value multiple times
+ * before setting it, which should be protected - hence
+ * this mutex:
+ */
+static DEFINE_MUTEX(rr_timeslice_mutex);
+
+unsigned int __read_mostly sched_rr_timeslice = RR_TIMESLICE;
+
+/*
+ * manage /proc/sys/kernel/sched_rr_timeslice_ms entry
+ * for changing SCHED_RR quantum interval
+ */
+
+int sched_rr_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp,
+		loff_t *ppos)
+{
+	int ret;
+
+	mutex_lock(&rr_timeslice_mutex);
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	/*
+	 * make sure that internally we keep jiffies
+	 * also, writing zero resets timeslice to default
+	 */
+	if (!ret && write)  {
+		sched_rr_timeslice = sched_rr_timeslice == 0 ?
+			RR_TIMESLICE : msecs_to_jiffies(sched_rr_timeslice);
+	}
+	mutex_unlock(&rr_timeslice_mutex);
+	return ret;
+}
+
 int sched_rt_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp,
 		loff_t *ppos)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 418feb0..71aa6d0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2010,7 +2010,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
 	if (--p->rt.time_slice)
 		return;
 
-	p->rt.time_slice = RR_TIMESLICE;
+	p->rt.time_slice = sched_rr_timeslice;
 
 	/*
 	 * Requeue to the end of queue if we (and all of our ancestors) are the
@@ -2041,7 +2041,7 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
 	 * Time slice is 0 for SCHED_FIFO tasks
 	 */
 	if (task->policy == SCHED_RR)
-		return RR_TIMESLICE;
+		return sched_rr_timeslice;
 	else
 		return 0;
 }
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c88878d..6770fc8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -403,6 +403,14 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= sched_rt_handler,
 	},
+	{
+		.procname	= "sched_rr_timeslice_ms",
+		.data		= &sched_rr_timeslice,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= sched_rr_handler,
+	},
+
 #ifdef CONFIG_SCHED_AUTOGROUP
 	{
 		.procname	= "sched_autogroup_enabled",
Download attachment "signature.asc" of type "application/pgp-signature" (199 bytes)
Powered by blists - more mailing lists
 
