lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200902060024.GK16601@localhost.localdomain>
Date:   Wed, 2 Sep 2020 08:00:24 +0200
From:   Juri Lelli <juri.lelli@...hat.com>
To:     Lucas Stach <l.stach@...gutronix.de>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel@...r.kernel.org, kernel@...gutronix.de,
        patchwork-lst@...gutronix.de
Subject: Re: [PATCH] sched/deadline: Fix stale throttling on de-/boosted tasks

Hi,

On 31/08/20 13:07, Lucas Stach wrote:
> When a boosted task gets throttled, what normally happens is that it's
> immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the
> runtime and clears the dl_throttled flag. There is a special case however:
> if the throttling happened on sched-out and the task has been deboosted in
> the meantime, the replenish is skipped as the task will return to its
> normal scheduling class. This leaves the task with the dl_throttled flag
> set.
> 
> Now if the task gets boosted up to the deadline scheduling class again
> while it is sleeping, it's still in the throttled state. The normal wakeup
> however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't
> actually place it on the rq. Thus we end up with a task that is runnable,
> but not actually on the rq and neither a immediate replenishment happens,
> nor is the replenishment timer set up, so the task is stuck in
> forever-throttled limbo.
> 
> Clear the dl_throttled flag before dropping back to the normal scheduling
> class to fix this issue.
> 
> Signed-off-by: Lucas Stach <l.stach@...gutronix.de>
> ---
> This is the root cause and fix of the issue described at [1]. After working
> on other stuff for the last few months, I finally was able to circle back
> to this issue and gather the required data to pinpoint the failure mode.
> 
> [1] https://lkml.org/lkml/2020/3/20/765
> ---
>  kernel/sched/deadline.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 3862a28cd05d..c19c1883d695 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1527,12 +1527,15 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
>  		pi_se = &pi_task->dl;
>  	} else if (!dl_prio(p->normal_prio)) {
>  		/*
> -		 * Special case in which we have a !SCHED_DEADLINE task
> -		 * that is going to be deboosted, but exceeds its
> -		 * runtime while doing so. No point in replenishing
> -		 * it, as it's going to return back to its original
> -		 * scheduling class after this.
> +		 * Special case in which we have a !SCHED_DEADLINE task that is going
> +		 * to be deboosted, but exceeds its runtime while doing so. No point in
> +		 * replenishing it, as it's going to return back to its original
> +		 * scheduling class after this. If it has been throttled, we need to
> +		 * clear the flag, otherwise the task may wake up as throttled after
> +		 * being boosted again with no means to replenish the runtime and clear
> +		 * the throttle.
>  		 */
> +		p->dl.dl_throttled = 0;
>  		BUG_ON(!p->dl.dl_boosted || flags != ENQUEUE_REPLENISH);
>  		return;
>  	}

Ah, right, thanks for looking into this issue!

Wonder if we should be calling __dl_clear_params() instead of just
clearing dl_throttled, but what you propose makes sense to me.

Acked-by: Juri Lelli <juri.lelli@...hat.com>

Best,

Juri

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ