linux-kernel - Re: [PATCH 2/5] sched: Teach scheduler to understand ONRQ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140722114542.GG20603@laptop.programming.kicks-ass.net>
Date:	Tue, 22 Jul 2014 13:45:42 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Kirill Tkhai <ktkhai@...allels.com>
Cc:	linux-kernel@...r.kernel.org,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Nicolas Pitre <nicolas.pitre@...aro.org>,
	Ingo Molnar <mingo@...nel.org>, Paul Turner <pjt@...gle.com>,
	tkhai@...dex.ru, Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH 2/5] sched: Teach scheduler to understand ONRQ_MIGRATING
 state

On Tue, Jul 22, 2014 at 03:30:16PM +0400, Kirill Tkhai wrote:
> 
> This is new on_rq state for the cases when task is migrating
> from one src_rq to another dst_rq, and locks of the both RQs
> are unlocked.
> 
> We will use the state this way:
> 
> 	raw_spin_lock(&src_rq->lock);
> 	dequeue_task(src_rq, p, 0);
> 	p->on_rq = ONRQ_MIGRATING;
> 	set_task_cpu(p, dst_cpu);
> 	raw_spin_unlock(&src_rq->lock);
> 
> 	raw_spin_lock(&dst_rq->lock);
> 	p->on_rq = ONRQ_QUEUED;
> 	enqueue_task(dst_rq, p, 0);
> 	raw_spin_unlock(&dst_rq->lock);
> 
> The profit is that double_rq_lock() is not needed now,
> and this may reduce the latencies in some situations.
> 
> The logic of try_to_wake_up() remained the same as it
> was. Its behaviour changes in a small subset of cases
> (when preempted task in ~TASK_RUNNING state is queued
>  on rq and we are migrating it to another).

more details is better ;-) Also, I think Oleg enjoys these kind of
things, so I've added him to the CC.

A few questions, haven't really thought about things yet.

> @@ -1491,10 +1491,14 @@ static void ttwu_activate(struct rq *rq, struct task_struct *p, int en_flags)
>  static void
>  ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
>  {
> -	check_preempt_curr(rq, p, wake_flags);
>  	trace_sched_wakeup(p, true);
>  
>  	p->state = TASK_RUNNING;
> +
> +	if (!task_queued(p))
> +		return;

How can this happen? we're in the middle of a wakeup, we're just added
the task to the rq and are still holding the appropriate rq->lock.

> @@ -4623,9 +4629,14 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>  	struct rq *rq;
>  	unsigned int dest_cpu;
>  	int ret = 0;
> -
> +again:
>  	rq = task_rq_lock(p, &flags);
>  
> +	if (unlikely(p->on_rq) == ONRQ_MIGRATING) {
> +		task_rq_unlock(rq, p, &flags);
> +		goto again;
> +	}
> +
>  	if (cpumask_equal(&p->cpus_allowed, new_mask))
>  		goto out;
>  

That looks like a non-deterministic spin loop, 'waiting' for the
migration to finish. Not particularly nice and something I think we
should avoid for it has bad (TM) worst case behaviour.

Also, why only this site and not all task_rq_lock() sites?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/