lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1193410770.3959.8.camel@ghaskins-t60p.haskins.net>
Date:	Fri, 26 Oct 2007 10:59:30 -0400
From:	Gregory Haskins <ghaskins@...ell.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org,
	Dmitry Adamushko <dmitry.adamushko@...il.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>, Darren Hart <dvhltc@...ibm.com>
Subject: Re: [PATCH] RT: Cache cpus_allowed weight for optimizing migration

On Fri, 2007-10-26 at 10:47 -0400, Steven Rostedt wrote:
> --
> On Fri, 26 Oct 2007, Gregory Haskins wrote:
> 
> > This version has the feedback from Steve's review incorporated
> >
> > -----------------------------------------
> >
> > RT: Cache cpus_allowed weight for optimizing migration
> >
> > Some RT tasks (particularly kthreads) are bound to one specific CPU.
> > It is fairly common for one or more bound tasks to get queued up at the
> > same time.  Consider, for instance, softirq_timer and softirq_sched.  A
> > timer goes off in an ISR which schedules softirq_thread to run at RT50.
> > Then during the handling of the timer, the system determines that it's
> > time to smp-rebalance the system so it schedules softirq_sched to run
> > from within the softirq_timer kthread context. So we are in a situation
> > where we have two RT50 tasks queued, and the system will go into
> > rt-overload condition to request other CPUs for help.
> >
> > The problem is that these tasks cannot ever be pulled away since they
> > are already running on their one and only valid RQ.  However, the other
> > CPUs cannot determine that the tasks are unpullable without going
> > through expensive checks/locking.  Therefore the helping CPUS
> 
> A little exageration there ;-)  We don't need to go through locking (at
> least my code doesn't), and the checks are not that expensive (could cause
> some unneeded cache bouncing though).

Well, its all relative.  Not going into overload to begin with, and
therefore not hitting the pull_rt_tasks() logic outright should be
faster than lockless scanning tricks ;)

But I digress.  I don't think the original code had your optimization,
so you are right:  It's perhaps a bit of an exaggeration in HEAD.  I'll
be less dramatic for the next drop ;)

> 
> > experience unecessary overhead/latencies regardless as they
> > ineffectively try to process the overload condition.
> >
> > This patch tries to optimize the situation by utilizing the hamming
> > weight of the task->cpus_allowed mask.  A weight of 1 indicates that
> > the task cannot be migrated, which may be utilized by the overload
> 
> s/may/will/

ack

> 
> > handling code to eliminate uncessary rebalance attempts.  We also
> > introduce a per-rq variable to count the number of migratable tasks
> > that are currently running.  We only go into overload if we have more
> > than one rt task, AND at least one of them is migratable.
> >
> > Calculating the weight is probably relatively expensive, so it is only
> > done when the cpus_allowed mask is updated (which should be relatively
> > infrequent, especially compared to scheduling frequency) and cached in
> > the task_struct.
> >
> >
> > Signed-off-by: Gregory Haskins <ghaskins@...ell.com>
> > ---
> >
> >  include/linux/sched.h |    2 ++
> >  kernel/fork.c         |    1 +
> >  kernel/sched.c        |    9 +++++++-
> >  kernel/sched_rt.c     |   58 +++++++++++++++++++++++++++++++++++++++++++++----
> >  4 files changed, 64 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 7a3829f..829de6f 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1048,6 +1048,7 @@ struct sched_class {
> >  	void (*set_curr_task) (struct rq *rq);
> >  	void (*task_tick) (struct rq *rq, struct task_struct *p);
> >  	void (*task_new) (struct rq *rq, struct task_struct *p);
> > +	void (*set_cpus_allowed)(struct task_struct *p, cpumask_t newmask);
> >  };
> >
> >  struct load_weight {
> > @@ -1144,6 +1145,7 @@ struct task_struct {
> >
> >  	unsigned int policy;
> >  	cpumask_t cpus_allowed;
> > +	int nr_cpus_allowed;
> >  	unsigned int time_slice;
> >
> >  #ifdef CONFIG_PREEMPT_RCU
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 5f11f23..f808e18 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1257,6 +1257,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> >  	 */
> >  	preempt_disable();
> >  	p->cpus_allowed = current->cpus_allowed;
> > +	p->nr_cpus_allowed = current->nr_cpus_allowed;
> >  	if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed) ||
> >  			!cpu_online(task_cpu(p))))
> >  		set_task_cpu(p, smp_processor_id());
> > diff --git a/kernel/sched.c b/kernel/sched.c
> > index 30fa531..6c90093 100644
> > --- a/kernel/sched.c
> > +++ b/kernel/sched.c
> > @@ -262,6 +262,7 @@ struct rt_rq {
> >  	int rt_load_balance_idx;
> >  	struct list_head *rt_load_balance_head, *rt_load_balance_curr;
> >  	unsigned long rt_nr_running;
> > +	unsigned long rt_nr_migratory;
> >  	unsigned long rt_nr_uninterruptible;
> >  	/* highest queued rt task prio */
> >  	int highest_prio;
> > @@ -5371,7 +5372,13 @@ int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
> >  		goto out;
> >  	}
> >
> > -	p->cpus_allowed = new_mask;
> > +	if (p->sched_class->set_cpus_allowed)
> > +		p->sched_class->set_cpus_allowed(p, new_mask);
> > +	else {
> > +		p->cpus_allowed    = new_mask;
> > +		p->nr_cpus_allowed = cpus_weight(new_mask);
> > +	}
> > +
> >  	/* Can the task run on the task's current CPU? If so, we're done */
> >  	if (cpu_isset(task_cpu(p), new_mask))
> >  		goto out;
> > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> > index b59dc20..6d293bd 100644
> > --- a/kernel/sched_rt.c
> > +++ b/kernel/sched_rt.c
> > @@ -50,6 +50,24 @@ static inline void update_curr_rt(struct rq *rq)
> >  	curr->se.sum_exec_runtime += delta_exec;
> >  	curr->se.exec_start = rq->clock;
> >  }
> > +#ifdef CONFIG_SMP
> > +static void inc_rt_migration(struct task_struct *p, struct rq *rq)
> > +{
> > +	rq->rt.rt_nr_migratory++;
> > +
> > +	if (rq->rt.rt_nr_running > 1)
> > +		rt_set_overload(p, rq->cpu);
> 
> I think we can have a race here. If we schedule a task that can migrate,
> and it's the only RT task, but before that task actually makes it to the
> scheduler, we schedule a higher priority task that can't migrate, we miss
> setting the overload on that CPU.

Ah, yes.  Good eye.  I will devise a fix.

Regards,
-Greg



Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ