linux-kernel - Re: [PATCH 4/5] sched/deadline: Cleanup on_dl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190731222046.5ff83259@sweethome>
Date:   Wed, 31 Jul 2019 22:20:46 +0200
From:   luca abeni <luca.abeni@...tannapisa.it>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Ingo Molnar <mingo@...nel.org>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <Valentin.Schneider@....com>,
        Qais Yousef <Qais.Yousef@....com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On Wed, 31 Jul 2019 18:32:47 +0100
Dietmar Eggemann <dietmar.eggemann@....com> wrote:
[...]
> >>>>  static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
> >>>>  {
> >>>> +	if (!on_dl_rq(dl_se))
> >>>> +		return;  
> >>>
> >>> Why allow double dequeue instead of WARN?  
> >>
> >> As I was saying to Valentin, it can currently happen that a task
> >> could have already been dequeued by update_curr_dl()->throttle
> >> called by dequeue_task_dl() before calling __dequeue_task_dl(). Do
> >> you think we should check for this condition before calling into
> >> dequeue_dl_entity()?  
> > 
> > Yes, that's what ->dl_throttled is for, right? And !->dl_throttled
> > && !on_dl_rq() is a BUG.  
> 
> OK, I will add the following snippet to the patch.
> Although it's easy to provoke a situation in which DL tasks are
> throttled, I haven't seen a throttling happening when the task is
> being dequeued.

This is a not-so-common situation, that can happen with periodic tasks
(a-la rt-app) blocking on clock_nanosleep() (or similar) after
executing for an amount of time comparable with the SCHED_DEADLINE
runtime.

It might happen that the task consumed a little bit more than the
remaining runtime (but has not been throttled yet, because the
accounting happens at every tick)... So, when dequeue_task_dl() invokes
update_task_dl() the runtime becomes negative and the task is throttled.

This happens infrequently, but if you try rt-app tasksets with multiple
tasks and execution times near to the runtime you will see it
happening, sooner or later.

[...]
> @@ -1592,6 +1591,10 @@ static void __dequeue_task_dl(struct rq *rq,
> struct task_struct *p) static void dequeue_task_dl(struct rq *rq,
> struct task_struct *p, int flags) {
>         update_curr_dl(rq);
> +
> +       if (p->dl.dl_throttled)
> +               return;

Sorry, I missed part of the previous discussion, so maybe I am missing
something... But I suspect this "return" might be wrong (you risk to
miss a call to task_non_contending(), coming later in this function).

Maybe you cound use
	if (!p->dl_throttled)
		__dequeue_task_dl(rq, p)

Or did I misunderstand something?

			Thanks,
				Luca