linux-kernel - Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGWmfYqL_351aRkjhr4Se=gK8iPycMofNGi6COOnXsoOTOD99A@mail.gmail.com>
Date:   Mon, 27 Mar 2017 10:54:02 +0200
From:   Claudio Scordino <claudio@...dence.eu.com>
To:     Luca Abeni <luca.abeni@...tannapisa.it>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@....com>,
        Tommaso Cucinotta <tommaso.cucinotta@...up.it>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Mathieu Poirier <mathieu.poirier@...aro.org>
Subject: Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

Hi guys,

2017-03-27 10:20 GMT+02:00 Luca Abeni <luca.abeni@...tannapisa.it>:
> On Fri, 24 Mar 2017 22:31:46 -0400
> Steven Rostedt <rostedt@...dmis.org> wrote:
>
>> On Fri, 24 Mar 2017 22:47:15 +0100
>> luca abeni <luca.abeni@...tannapisa.it> wrote:
>>
>> > Ok... Since I am not good at ascii art, would it be ok to add a
>> > textual description? If yes, I'll add a comment like:
>> > "
>> > The utilization of a task is added to the runqueue's active
>> > utilization when the task becomes active (is enqueued in the
>> > runqueue), and is removed when the task becomes inactive. A task
>> > does not become immediately inactive when it blocks, but becomes
>> > inactive at the so called "0 lag time"; so, we setup the "inactive
>> > timer" to fire at the "0 lag time". When the "inactive timer"
>> > fires, the task utilization is removed from the runqueue's active
>> > utilization. If the task wakes up again on the same runqueue before
>> > the "0 lag time", the active utilization must not be changed and
>> > the "inactive timer" must be cancelled. If the task wakes up again
>> > on a different runqueue before the "0 lag time", then the task's
>> > utilization must be removed from the previous runqueue's active
>> > utilization and must be added to the new runqueue's active
>> > utilization. In order to avoid races between a task waking up on a
>> > runqueue while the "inactive timer" is running on a different CPU,
>> > the "dl_non_contending" flag is used to indicate that a task is not
>> > on a runqueue but is active (so, the flag is set when the task
>> > blocks and is cleared when the "inactive timer" fires or when the
>> > task  wakes up).
>>
>> Sure, the above is great if you never want anyone to read it ;)
>>
>> Can you please break it up a little. My head starts to spin by the
>> third line down.
>
> Ok... Maybe finding a clean and understandable way to explain the
> above sentence is something that can be done at the OSPM summit?

What about adding the following text to Documentation/ ?
Does it make things clearer ?

Cheers,

             Claudio




The following figure illustrates the state names of the GRUB algorithm:

                          ------------
          (d)        |   Active   |
           ------------->|            |
           |             | Contending |
           |              ------------
           |                A      |
   ----------           |      |
  |          |          |      |
  | Inactive |          |(b)   | (a)
  |          |          |      |
   ----------           |      |
           A                |      V
           |              ------------
           |             |   Active   |
           --------------|     Non    |
          (c)        | Contending |
                          ------------

A task can be in one of the following states:

 - ActiveContending: if it is ready for execution (or executing);

 - ActiveNonContending: if it just blocked and has not yet surpassed the 0-lag
   time;

 - Inactive: if it is blocked and has surpassed the 0-lag time.


For each runqueue, the algorithm GRUB keeps track of two different bandwidths:

 - Active bandwidth (running_bw): this is the sum of the bandwidths of all
   tasks in active state (i.e., ActiveContending or ActiveNonContending);

 - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
   runqueue, including the tasks in Inactive state.


State transitions:

 (a) When a task blocks, it does not become immediately inactive since its
     bandwidth cannot be immediately reclaimed without breaking the
     real-time guarantees. It therefore enters a transitional state called
     ActiveNonContending. The scheduler arms the "inactive timer" to fire at
     the 0-lag time, when the task's bandwidth can be reclaimed without
     breaking the real-time guarantees.

 (b) If the task wakes up before the inactive timer fires, the task re-enters
     the ActiveContending state and the "inactive timer" is canceled.
     In addition, if the task wakes up on a different runqueue, then
     the task's utilization must be removed from the previous runqueue's active
     utilization and must be added to the new runqueue's active utilization.
     In order to avoid races between a task waking up on a runqueue while the
      "inactive timer" is running on a different CPU, the "dl_non_contending"
     flag is used to indicate that a task is not on a runqueue but is active
     (so, the flag is set when the task blocks and is cleared when the
     "inactive timer" fires or when the task  wakes up).

 (c) When the "inactive timer" fires, the task enters the Inactive state and its
     utilization is removed from the runqueue's active utilization.

 (d) When an inactive task wakes up, it enters the ActiveContending state and
     its utilization is added to the active utilization of the runqueue where
     it has been enqueued.


The algorithm reclaims the bandwidth of the tasks in Inactive state.
It does so by decrementing the runtime of the executing task Ti at a pace equal
to

    (1 - Uinact)*dt

where Uinact is the inactive utilization, computed as (this_bq - running_bw).

The GRUB algorithm is described in the following papers:

[1] G. Lipari, S. Baruah, Greedy reclamation of unused bandwidth in
constant-bandwidth servers, 12th IEEE Euromicro Conference on Real-Time Systems,
2000.
[2] L. Abeni, J. Lelli, C. Scordino, L. Palopoli, Greedy CPU reclaiming for
SCHED DEADLINE. In Proceedings of the Real-Time Linux Workshop (RTLWS),
Dusseldorf, Germany, 2014.
[3] L. Abeni, G. Lipari, A. Parri, Y. Sun, Multicore CPU reclaiming: parallel or
sequential?. In Proceedings of the 31st Annual ACM Symposium on Applied
Computing, 2016.