linux-kernel - Re: [PATCH v3] sched/deadline: Fix bad accounting of nr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5304AE4E.6030208@gmail.com>
Date:	Wed, 19 Feb 2014 14:14:54 +0100
From:	Juri Lelli <juri.lelli@...il.com>
To:	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>
CC:	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v3] sched/deadline: Fix bad accounting of nr_running

On 02/19/2014 11:32 AM, Juri Lelli wrote:
> On 02/19/2014 09:46 AM, Peter Zijlstra wrote:
>> On Tue, Feb 18, 2014 at 09:50:12PM -0500, Steven Rostedt wrote:
>>>  
>>>> Rationale for this odd behavior is that, when a task is throttled, it
>>>> is removed only from the dl_rq, but we keep it on_rq (as this is not
>>>> a "full dequeue", that is the task is not actually sleeping). But, it
>>>> is also true that, while throttled a task behaves like it is sleeping
>>>> (e.g., its timer will fire on a new CPU if the old one is dead). So,
>>>> Steven's fix sounds also semantically correct.
>>>
>>> Actually, it seems that I was hitting it again, but this time getting a
>>> negative number. OK, after looking at the code a bit more, I think we
>>> should update the runqueue nr_running only when the task is officially
>>> enqueued and dequeued, and all accounting within, will not touch that
>>> number.
> 
> This is a different way to get the same result (mildly tested on my box):
> 
> ---
>  kernel/sched/deadline.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0dd5e09..675dad3 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -837,7 +837,8 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
>         if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
>                 enqueue_pushable_dl_task(rq, p);
> 
> -       inc_nr_running(rq);
> +       if (!(flags & ENQUEUE_REPLENISH))
> +               inc_nr_running(rq);
>  }
> 
>  static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
> --
> 
> We touch nr_running only when we don't enqueue back as a consequence
> of a replenishment.
> 
>>
>> But if the task is throttled it should still very much decrement the
>> number. There's places that very much rely on nr_running be exactly the
>> number of runnable tasks.
>>
> 
> This is a different thing, and V2 seemed to implement this behavior
> (that's why I said it looked semantically correct).
> 

So, both my last approach and Steven's V2 were causing nr_running to
become negative, as they double decrement it when dequeuing a task that
also exceeded its budget.

What follows seems to solve the issue, and correcly account for throttled
tasks as !nr_running.

---
 kernel/sched/deadline.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0dd5e09..b819577 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -717,6 +717,7 @@ void inc_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)

        WARN_ON(!dl_prio(prio));
        dl_rq->dl_nr_running++;
+       inc_nr_running(rq_of_dl_rq(dl_rq));

        inc_dl_deadline(dl_rq, deadline);
        inc_dl_migration(dl_se, dl_rq);
@@ -730,6 +731,7 @@ void dec_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
        WARN_ON(!dl_prio(prio));
        WARN_ON(!dl_rq->dl_nr_running);
        dl_rq->dl_nr_running--;
+       dec_nr_running(rq_of_dl_rq(dl_rq));

        dec_dl_deadline(dl_rq, dl_se->deadline);
        dec_dl_migration(dl_se, dl_rq);
@@ -836,8 +838,6 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)

        if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
                enqueue_pushable_dl_task(rq, p);
-
-       inc_nr_running(rq);
 }

 static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
@@ -850,8 +850,6 @@ static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 {
        update_curr_dl(rq);
        __dequeue_task_dl(rq, p, flags);
-
-       dec_nr_running(rq);
 }

 /*
-- 
1.7.9.5

Steven, could you test it?

Thanks,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/