linux-kernel - Re: [RFD] sched/deadline: Support single CPU affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161110125635.GB30974@sisyphus.home.austad.us>
Date:   Thu, 10 Nov 2016 13:56:35 +0100
From:   Henrik Austad <henrik@...tad.us>
To:     luca abeni <luca.abeni@...tn.it>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Juri Lelli <juri.lelli@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Claudio Scordino <claudio@...dence.eu.com>,
        Tommaso Cucinotta <tommaso.cucinotta@...up.it>,
        Daniel Bistrot de Oliveira <danielbristot@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFD] sched/deadline: Support single CPU affinity

On Thu, Nov 10, 2016 at 01:38:40PM +0100, luca abeni wrote:
> Hi Henrik,

Hi Luca,

> On Thu, 10 Nov 2016 13:21:00 +0100
> Henrik Austad <henrik@...tad.us> wrote:
> > On Thu, Nov 10, 2016 at 09:08:07AM +0100, Peter Zijlstra wrote:
> [...]
> > > We define the time to fail as:
> > > 
> > >   ttf(t) := t_d - t_b; where
> > > 
> > > 	t_d is t's absolute deadline
> > > 	t_b is t's remaining budget
> > > 
> > > This is the last possible moment we must schedule this task such
> > > that it can complete its work and not miss its deadline.  
> > 
> > To elaborate a bit on this (this is a modified LLF approach if my
> > memory serves):
> > 
> > You have the dynamic time-to-failure (TtF), i.e. as the task
> > progresses (scheduled to run), the relative time-to-failure will
> > remain constant. This can be used to compare thasks to a running task
> > and should minimize the number of calculations required.
> > 
> > Then you have the static Time-of-failure (ToF, which is the absoulte
> > time when a task will no longer be able to meet its deadline. This is
> > what you use for keeping a sorted list of tasks in the runqueue. As
> > this is a fixed point in time, you do not have to dynamically update
> > or do crazy calculation when inserting/removing threads from the rq.
> 
> Sorry, I am missing something here: if ttf is defined as
> 	ttf_i = d_i - q_i

So I picked the naming somewhat independently of Peter, his approach is 
the _absolute_ time of failure, the actual time X, irrespective of the task 
running or not.

I added 2 different measures for the same thing:

* ToF: 
The absolute time of failure is the point in time when the task will no 
longer be able to meet its deadline. If a task is scheduled and is running 
on a CPU, this value will move forward at the speed of execution. i.e. when 
the task is running, this value is changing. When the task is waiting in 
the runqueue, this value is constant.

TtF:
The relative time to failure is the value that is tied to the local CPU so 
to speak. When a task is running, this value is constant as it is the 
remaining time until the task is no longer able to meet its deadline. When 
the task is enqueued, this value will steadily decrease as it draws closer 
to the time when it will fail.

So when a task is running on a CPU, you use TtF, when it is in the runqueue 
you compare ToF

> (where d_i is the deadline of thask i and q_i is its remaining budget),
> then it also is the time before which you have to schedule task i if
> you do not want to miss the deadline... No? So, I do not understand the
> difference with tof.

So you can calculate one form the other given absolute deadline and 
remaining budget (or consumed CPU-time). But it is handy to use both as it 
removes a lot of duplicity and once you get the hang of the terms, makes it 
a bit easier to reason about the system.

> > > If we then augment the regular EDF rules by, for local tasks,
> > > considering the time to fail and let this measure override the
> > > regular EDF pick when the time to fail can be overran by the EDF
> > > pick.  
> > 
> > Then, if you do this - do you need to constrict this to a local CPU?
> > I *think* you could do this in a global scheduler if you use ToF/TtF
> > for all deadline-tasks, I think you should be able to meet deadlines.
> I think the ToF/TtF scheduler will be equivalent to LLF (see previous
> emails)... Or am I misunderstanding something? (see above)
> And LLF is not optimal on multiple CPUs, so I do not think it will be
> able to meet deadlines if you use it as a global scheduler.

I think I called it Earliest Failure First (I really wanted to call it 
failure-driven scheduling but that implied a crappy scheduler ;)

LLF is prone to high task-switch count when multiple threads gets close to 
0 laxity. But as I said, it's been a while since I last worked through the 
theory, so I have some homework to do before arguing too hard about this.

> > I had a rant about this way back [1,2 Sec 11.4], I need to sit down
> > and re-read most of it, it has been a few too many years, but the
> > idea was to minimize the number of context-switches (which LLF is
> > prone to get a lot of) as well as minimize the computational overhead
> > by avoiding re-calculating time-of-failre/time-to-failre a lot.
> > 
> > > That is, when:
> > > 
> > > 	now + left_b > min(ttf)  
> > 
> > Why not just  use ttf/tof for all deadline-tasks? We have all the 
> > information available anyway and it would probably make the internal
> > logic easier?
> I think LLF causes more preemptions and migrations than EDF.

yes, it does, which is why you need to adjust LLF to minimize the number of 
task-switches.

-Henrik

Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)