linux-kernel - Re: [PATCH 4/5] sched/deadline: Cleanup on_dl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e69422ca-26d3-2c36-854d-1e1369925b41@arm.com>
Date:   Thu, 1 Aug 2019 17:01:48 +0100
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     luca abeni <luca.abeni@...tannapisa.it>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Ingo Molnar <mingo@...nel.org>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <Valentin.Schneider@....com>,
        Qais Yousef <Qais.Yousef@....com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On 7/31/19 9:20 PM, luca abeni wrote:
> On Wed, 31 Jul 2019 18:32:47 +0100
> Dietmar Eggemann <dietmar.eggemann@....com> wrote:
> [...]
>>>>>>  static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
>>>>>>  {
>>>>>> +	if (!on_dl_rq(dl_se))
>>>>>> +		return;  
>>>>>
>>>>> Why allow double dequeue instead of WARN?  
>>>>
>>>> As I was saying to Valentin, it can currently happen that a task
>>>> could have already been dequeued by update_curr_dl()->throttle
>>>> called by dequeue_task_dl() before calling __dequeue_task_dl(). Do
>>>> you think we should check for this condition before calling into
>>>> dequeue_dl_entity()?  
>>>
>>> Yes, that's what ->dl_throttled is for, right? And !->dl_throttled
>>> && !on_dl_rq() is a BUG.  
>>
>> OK, I will add the following snippet to the patch.
>> Although it's easy to provoke a situation in which DL tasks are
>> throttled, I haven't seen a throttling happening when the task is
>> being dequeued.
> 
> This is a not-so-common situation, that can happen with periodic tasks
> (a-la rt-app) blocking on clock_nanosleep() (or similar) after
> executing for an amount of time comparable with the SCHED_DEADLINE
> runtime.
> 
> It might happen that the task consumed a little bit more than the
> remaining runtime (but has not been throttled yet, because the
> accounting happens at every tick)... So, when dequeue_task_dl() invokes
> update_task_dl() the runtime becomes negative and the task is throttled.
> 
> This happens infrequently, but if you try rt-app tasksets with multiple
> tasks and execution times near to the runtime you will see it
> happening, sooner or later.
> 
> 
> [...]
>> @@ -1592,6 +1591,10 @@ static void __dequeue_task_dl(struct rq *rq,
>> struct task_struct *p) static void dequeue_task_dl(struct rq *rq,
>> struct task_struct *p, int flags) {
>>         update_curr_dl(rq);
>> +
>> +       if (p->dl.dl_throttled)
>> +               return;
> 
> Sorry, I missed part of the previous discussion, so maybe I am missing
> something... But I suspect this "return" might be wrong (you risk to
> miss a call to task_non_contending(), coming later in this function).
> 
> Maybe you cound use
> 	if (!p->dl_throttled)
> 		__dequeue_task_dl(rq, p)
> 

I see. With the following rt-app file on h960 (8 CPUs) I'm able to
recreate the situation relatively frequently.

...
"tasks" : {
 "thread0" : {
  "instance" : 12,
  "run" : 11950,
  "timer" : { "ref" : "unique", "period" : 100000, "mode" : "absolute"},
  "dl-runtime" : 12000,
  "dl-period" : 100000,
  "dl-deadline" : 100000
 }
}

...
[ 1912.086664] CPU1: p=[thread0-9 3070] throttled p->on_rq=0 flags=0x9
[ 1912.086726] CPU2: p=[thread0-10 3071] throttled p->on_rq=0 flags=0x9
[ 1924.738912] CPU6: p=[thread0-10 3149] throttled p->on_rq=0 flags=0x9
...

And the flag DEQUEUE_SLEEP is set so like you said
task_non_contending(p) should be called.

I'm going to use your proposal. Thank you for the help!