linux-kernel - Re: [PATCH RESEND 1/2] sched/rt: Check to push the task when changing its affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150203221419.4428225e@grimm.local.home>
Date:	Tue, 3 Feb 2015 22:14:19 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Xunlei Pang <xlpang@....com>
Cc:	linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...il.com>,
	Xunlei Pang <pang.xunlei@...aro.org>
Subject: Re: [PATCH RESEND 1/2] sched/rt: Check to push the task when
 changing its affinity

On Wed,  4 Feb 2015 09:12:20 +0800
Xunlei Pang <xlpang@....com> wrote:

> From: Xunlei Pang <pang.xunlei@...aro.org>
> 
> We may suffer from extra rt overload rq due to the affinity,
> so when the affinity of any runnable rt task is changed, we
> should check to trigger balancing, otherwise it will cause
> some unnecessary delayed real-time response. Unfortunately,
> current RT global scheduler doesn't trigger anything.
> 
> For example: a 2-cpu system with two runnable FIFO tasks(same
> rt_priority) bound on CPU0, let's name them rt1(running) and
> rt2(runnable) respectively; CPU1 has no RTs. Then, someone sets
> the affinity of rt2 to 0x3(i.e. CPU0 and CPU1), but after this,
> rt2 still can't be scheduled until rt1 enters schedule(), this
> definitely causes some/big response latency for rt2.
> 

I understand the issue you point out, but I have to be honest and say
that I really do not like this approach.

> So, when doing set_cpus_allowed_rt(), if detecting such cases,
> check to trigger a push behaviour.
> 
> Signed-off-by: Xunlei Pang <pang.xunlei@...aro.org>
> ---
>  kernel/sched/rt.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 59 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index f4d4b07..4dacb6e 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1428,7 +1428,7 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
>  	return next;
>  }
>  
> -static struct task_struct *_pick_next_task_rt(struct rq *rq)
> +static struct task_struct *_pick_next_task_rt(struct rq *rq, int peek_only)
>  {

peek_only should be bool, but don't worry about it, I think this isn't
needed.

>  	struct sched_rt_entity *rt_se;
>  	struct task_struct *p;
> @@ -1441,7 +1441,8 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
>  	} while (rt_rq);
>  
>  	p = rt_task_of(rt_se);
> -	p->se.exec_start = rq_clock_task(rq);
> +	if (!peek_only)
> +		p->se.exec_start = rq_clock_task(rq);
>  
>  	return p;
>  }
> @@ -1476,7 +1477,7 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev)
>  
>  	put_prev_task(rq, prev);
>  
> -	p = _pick_next_task_rt(rq);
> +	p = _pick_next_task_rt(rq, 0);
>  
>  	/* The running task is never eligible for pushing */
>  	dequeue_pushable_task(rq, p);
> @@ -1886,28 +1887,69 @@ static void set_cpus_allowed_rt(struct task_struct *p,
>  				const struct cpumask *new_mask)
>  {
>  	struct rq *rq;
> -	int weight;
> +	int old_weight, new_weight;
> +	int preempt_push = 0, direct_push = 0;
>  
>  	BUG_ON(!rt_task(p));
>  
>  	if (!task_on_rq_queued(p))
>  		return;
>  
> -	weight = cpumask_weight(new_mask);
> +	old_weight = p->nr_cpus_allowed;
> +	new_weight = cpumask_weight(new_mask);
> +
> +	rq = task_rq(p);
> +
> +	if (new_weight > 1 &&
> +	    rt_task(rq->curr) &&
> +	    !test_tsk_need_resched(rq->curr)) {
> +		/*
> +		 * Set new mask information to prepare pushing.
> +		 * It's safe to do this here.

Please explain why it is safe.

> +		 */
> +		cpumask_copy(&p->cpus_allowed, new_mask);
> +		p->nr_cpus_allowed = new_weight;
> +
> +		if (task_running(rq, p) &&
> +		    cpumask_test_cpu(task_cpu(p), new_mask) &&
> +		    cpupri_find(&rq->rd->cpupri, p, NULL)) {

Hmm, You called cpupri_find() which should also return a mask of the
CPUs with the lowest priorities. I wonder if we could have utilize this
information instead of doing it twice? Of course things could change by
the time the task migrates.

> +			/*
> +			 * At this point, current task gets migratable most
> +			 * likely due to the change of its affinity, let's
> +			 * figure out if we can migrate it.
> +			 *
> +			 * Is there any task with the same priority as that
> +			 * of current task? If found one, we should resched.
> +			 * NOTE: The target may be unpushable.
> +			 */
> +			if (p->prio == rq->rt.highest_prio.next) {
> +				/* One target just in pushable_tasks list. */
> +				requeue_task_rt(rq, p, 0);
> +				preempt_push = 1;
> +			} else if (rq->rt.rt_nr_total > 1) {
> +				struct task_struct *next;
> +
> +				requeue_task_rt(rq, p, 0);
> +				/* peek only */
> +				next = _pick_next_task_rt(rq, 1);
> +				if (next != p && next->prio == p->prio)
> +					preempt_push = 1;
> +			}

I'm thinking it would be better just to send an IPI to the CPU that
figures this out and pushes a task off of itself.

> +		} else if (!task_running(rq, p))
> +			direct_push = 1;
> +	}
>  
>  	/*
>  	 * Only update if the process changes its state from whether it
>  	 * can migrate or not.
>  	 */
> -	if ((p->nr_cpus_allowed > 1) == (weight > 1))
> -		return;
> -
> -	rq = task_rq(p);
> +	if ((old_weight > 1) == (new_weight > 1))
> +		goto out;
>  
>  	/*
>  	 * The process used to be able to migrate OR it can now migrate
>  	 */
> -	if (weight <= 1) {
> +	if (new_weight <= 1) {
>  		if (!task_current(rq, p))
>  			dequeue_pushable_task(rq, p);
>  		BUG_ON(!rq->rt.rt_nr_migratory);
> @@ -1919,6 +1961,13 @@ static void set_cpus_allowed_rt(struct task_struct *p,
>  	}
>  
>  	update_rt_migration(&rq->rt);
> +
> +out:
> +	if (direct_push)
> +		push_rt_tasks(rq);
> +
> +	if (preempt_push)
> +		resched_curr(rq);

I don't know. This just doesn't seem clean.

-- Steve

>  }
>  
>  /* Assumes rq->lock is held */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/