linux-kernel - Re: Another SCHED_DEADLINE bug (with bisection and possible fix)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54CB5E56.9080506@arm.com>
Date:	Fri, 30 Jan 2015 10:35:02 +0000
From:	Juri Lelli <juri.lelli@....com>
To:	Peter Zijlstra <peterz@...radead.org>,
	Luca Abeni <luca.abeni@...tn.it>
CC:	Kirill Tkhai <tkhai@...dex.ru>, Juri Lelli <juri.lelli@....com>,
	Ingo Molnar <mingo@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Another SCHED_DEADLINE bug (with bisection and possible fix)

Hi Peter,

On 28/01/2015 14:08, Peter Zijlstra wrote:
> On Thu, Jan 15, 2015 at 02:35:46PM +0100, Luca Abeni wrote:
> 
>>> >From what I understand we should either modify the tasks run/sleep stats
>>> when we change its parameters or we should schedule a delayed release of
>>> the bandwidth delta (when it reaches its 0-lag point, if thats in the
>>> future).
> 
>> I suspect the correct behaviour can be difficult to implement:
>> - When a SCHED_DEADLINE task ends (or changes its scheduling policy to
>>   something different), its bandwidth cannot be released immediately,
>>   but should be released at the "0-lag time" (which reminds me about the
>>   GRUB patches... I had to implement a similar behaviour in those patches :)
>> - The same applies when the task changes its scheduling parameters decreasing
>>   its bandwidth. In this case, we also need to update the current runtime (if
>>   it is larger than the new runtime, set it to the new maximum value - I think
>>   this is the safest thing to do)
>> - When a task changes its parameters to increase its bandwidth, be do not
>>   have such problems.
>>
>> As far as I can see, if we apply the runtime / deadline changes starting from
>> the next reservation period we are safe (because the "0-lag time" is always
>> smaller than the current scheduling deadline).
>> This might cause some transient overloads (because if I change the parameters
>> of a task at time t, the update takes action a little bit later - at the next
>> scheduling deadline), but guarantees that a task never consumes more than
>> expected (for example: if a task continuously changes its bandwidth between
>> 0.4 and 0.3, it will never consume more than 0.4. I suspect that if we
>> immediately update dl_se->deadline and dl_se->runtime a task can arrive to
>> consume much more CPU time).
> 
> 
> OK, how about something like this; it seems to survive your Bug-Test for
> at least 50 cycles.
> 

Thanks for the patch!

> ---
>  kernel/sched/core.c     | 33 ++++++++++++++++++++++++++++-----
>  kernel/sched/deadline.c |  3 ++-
>  2 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ade2958a9197..d787d6553d72 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1816,6 +1816,10 @@ void __dl_clear_params(struct task_struct *p)
>  	dl_se->dl_period = 0;
>  	dl_se->flags = 0;
>  	dl_se->dl_bw = 0;
> +
> +	dl_se->dl_throttled = 0;
> +	dl_se->dl_new = 1;
> +	dl_se->dl_yielded = 0;
>  }
>  
>  /*
> @@ -1844,7 +1848,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
>  #endif
>  
>  	RB_CLEAR_NODE(&p->dl.rb_node);
> -	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	init_dl_task_timer(&p->dl);
>  	__dl_clear_params(p);
>  
>  	INIT_LIST_HEAD(&p->rt.run_list);
> @@ -2054,6 +2058,9 @@ static inline int dl_bw_cpus(int i)
>   * allocated bandwidth to reflect the new situation.
>   *
>   * This function is called while holding p's rq->lock.
> + *
> + * XXX we should delay bw change until the task's 0-lag point, see
> + * __setparam_dl().
>   */
>  static int dl_overflow(struct task_struct *p, int policy,
>  		       const struct sched_attr *attr)
> @@ -3258,15 +3265,31 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
>  {
>  	struct sched_dl_entity *dl_se = &p->dl;
>  
> -	init_dl_task_timer(dl_se);
>  	dl_se->dl_runtime = attr->sched_runtime;
>  	dl_se->dl_deadline = attr->sched_deadline;
>  	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
>  	dl_se->flags = attr->sched_flags;
>  	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> -	dl_se->dl_throttled = 0;
> -	dl_se->dl_new = 1;
> -	dl_se->dl_yielded = 0;
> +
> +	/*
> +	 * Changing the parameters of a task is 'tricky' and we're not doing
> +	 * the correct thing -- also see task_dead_dl() and switched_from_dl().
> +	 *
> +	 * What we SHOULD do is delay the bandwidth release until the 0-lag
> +	 * point. This would include retaining the task_struct until that time
> +	 * and change dl_overflow() to not immediately decrement the current
> +	 * amount.
> +	 *
> +	 * Instead we retain the current runtime/deadline and let the new
> +	 * parameters take effect after the current reservation period lapses.
> +	 * This is safe (albeit pessimistic) because the 0-lag point is always
> +	 * before the current scheduling deadline.
> +	 *
> +	 * We can still have temporary overloads because we do not delay the
> +	 * change in bandwidth until that time; so admission control is
> +	 * not on the safe side. It does however guarantee tasks will never
> +	 * consume more than promised.

So, we do the safe thing only in case of throttling. I guess is more than
ok for now, while we hopefully find some spare cycle to implement a
complete solution :/.

Thanks a lot!

Best,

- Juri

> +	 */
>  }
>  
>  /*
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index b52092f2636d..726470d47f87 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1094,6 +1094,7 @@ static void task_dead_dl(struct task_struct *p)
>  	 * Since we are TASK_DEAD we won't slip out of the domain!
>  	 */
>  	raw_spin_lock_irq(&dl_b->lock);
> +	/* XXX we should retain the bw until 0-lag */
>  	dl_b->total_bw -= p->dl.dl_bw;
>  	raw_spin_unlock_irq(&dl_b->lock);
>  
> @@ -1614,8 +1615,8 @@ static void cancel_dl_timer(struct rq *rq, struct task_struct *p)
>  
>  static void switched_from_dl(struct rq *rq, struct task_struct *p)
>  {
> +	/* XXX we should retain the bw until 0-lag */
>  	cancel_dl_timer(rq, p);
> -
>  	__dl_clear_params(p);
>  
>  	/*
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/