linux-kernel - Re: [RFD] sched/deadline: Support single CPU affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161110152321.5372a582@sweethome>
Date:   Thu, 10 Nov 2016 15:23:21 +0100
From:   luca abeni <luca.abeni@...tn.it>
To:     Henrik Austad <henrik@...tad.us>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Juri Lelli <juri.lelli@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Claudio Scordino <claudio@...dence.eu.com>,
        Tommaso Cucinotta <tommaso.cucinotta@...up.it>,
        Daniel Bistrot de Oliveira <danielbristot@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFD] sched/deadline: Support single CPU affinity

Hi Henrik,

On Thu, 10 Nov 2016 13:56:35 +0100
Henrik Austad <henrik@...tad.us> wrote:

> On Thu, Nov 10, 2016 at 01:38:40PM +0100, luca abeni wrote:
> > Hi Henrik,  
> 
> Hi Luca,
> 
> > On Thu, 10 Nov 2016 13:21:00 +0100
> > Henrik Austad <henrik@...tad.us> wrote:  
> > > On Thu, Nov 10, 2016 at 09:08:07AM +0100, Peter Zijlstra wrote:  
> > [...]  
> > > > We define the time to fail as:
> > > > 
> > > >   ttf(t) := t_d - t_b; where
> > > > 
> > > > 	t_d is t's absolute deadline
> > > > 	t_b is t's remaining budget
> > > > 
> > > > This is the last possible moment we must schedule this task such
> > > > that it can complete its work and not miss its deadline.    
> > > 
> > > To elaborate a bit on this (this is a modified LLF approach if my
> > > memory serves):
> > > 
> > > You have the dynamic time-to-failure (TtF), i.e. as the task
> > > progresses (scheduled to run), the relative time-to-failure will
> > > remain constant. This can be used to compare thasks to a running
> > > task and should minimize the number of calculations required.
> > > 
> > > Then you have the static Time-of-failure (ToF, which is the
> > > absoulte time when a task will no longer be able to meet its
> > > deadline. This is what you use for keeping a sorted list of tasks
> > > in the runqueue. As this is a fixed point in time, you do not
> > > have to dynamically update or do crazy calculation when
> > > inserting/removing threads from the rq.  
> > 
> > Sorry, I am missing something here: if ttf is defined as
> > 	ttf_i = d_i - q_i  
> 
> So I picked the naming somewhat independently of Peter, his approach
> is the _absolute_ time of failure, the actual time X, irrespective of
> the task running or not.
> 
> I added 2 different measures for the same thing:
> 
> * ToF: 
> The absolute time of failure is the point in time when the task will
> no longer be able to meet its deadline. If a task is scheduled and is
> running on a CPU, this value will move forward at the speed of
> execution. i.e. when the task is running, this value is changing.
> When the task is waiting in the runqueue, this value is constant.
Ah, ok... So, if I understand well you ToF is Peter's ttf... Right?


> TtF:
> The relative time to failure is the value that is tied to the local
> CPU so to speak. When a task is running, this value is constant as it
> is the remaining time until the task is no longer able to meet its
> deadline. When the task is enqueued, this value will steadily
> decrease as it draws closer to the time when it will fail.
So, if I understand weel, TtF = ToF - current time... Right? I think
this is often called "laxity" or "slack time". No?

[...]
> > > > If we then augment the regular EDF rules by, for local tasks,
> > > > considering the time to fail and let this measure override the
> > > > regular EDF pick when the time to fail can be overran by the EDF
> > > > pick.    
> > > 
> > > Then, if you do this - do you need to constrict this to a local
> > > CPU? I *think* you could do this in a global scheduler if you use
> > > ToF/TtF for all deadline-tasks, I think you should be able to
> > > meet deadlines.  
> > I think the ToF/TtF scheduler will be equivalent to LLF (see
> > previous emails)... Or am I misunderstanding something? (see above)
> > And LLF is not optimal on multiple CPUs, so I do not think it will
> > be able to meet deadlines if you use it as a global scheduler.  
> 
> I think I called it Earliest Failure First (I really wanted to call
> it failure-driven scheduling but that implied a crappy scheduler ;)
Ok, but... How is it different from LLF?
In my understanding (but, again, maybe I am missing something) ToF and
TtF are just a way to implement LLF more efficiently (because, as you
notice, ToF does not change when a task is executing and TtF does not
change when the task is executing).


			Luca



> LLF is prone to high task-switch count when multiple threads gets
> close to 0 laxity. But as I said, it's been a while since I last
> worked through the theory, so I have some homework to do before
> arguing too hard about this.
> 
> > > I had a rant about this way back [1,2 Sec 11.4], I need to sit
> > > down and re-read most of it, it has been a few too many years,
> > > but the idea was to minimize the number of context-switches
> > > (which LLF is prone to get a lot of) as well as minimize the
> > > computational overhead by avoiding re-calculating
> > > time-of-failre/time-to-failre a lot. 
> > > > That is, when:
> > > > 
> > > > 	now + left_b > min(ttf)    
> > > 
> > > Why not just  use ttf/tof for all deadline-tasks? We have all the 
> > > information available anyway and it would probably make the
> > > internal logic easier?  
> > I think LLF causes more preemptions and migrations than EDF.  
> 
> yes, it does, which is why you need to adjust LLF to minimize the
> number of task-switches.
> 
> -Henrik