linux-kernel - Re: [PATCH 3/2] sched/deadline: Use deadline instead of period when calculating overflow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170215102919.GA1368@e106622-lin>
Date:   Wed, 15 Feb 2017 10:29:19 +0000
From:   Juri Lelli <juri.lelli@....com>
To:     Luca Abeni <luca.abeni@...tannapisa.it>
Cc:     Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Tommaso Cucinotta <tommaso.cucinotta@...up.it>,
        Mike Galbraith <efault@....de>,
        Romulo Silva de Oliveira <romulo.deoliveira@...c.br>
Subject: Re: [PATCH 3/2] sched/deadline: Use deadline instead of period when
 calculating overflow

Hi,

On 15/02/17 08:40, Luca Abeni wrote:
> Hi Steven,
> 
> On Tue, 14 Feb 2017 19:14:17 -0500
> Steven Rostedt <rostedt@...dmis.org> wrote:
> [...]
> > > I am not sure about the correct fix (wouldn't
> > > "runtime / (deadline - t) > dl_runtime / dl_deadline" allow the
> > > task to use a fraction of CPU time equal to dl_runtime /
> > > dl_deadline?)
> > > 
> > > The current code is clearly wrong (as shown by Daniel), but I do not
> > > understand how the current check can allow the task to consume more
> > > than dl_runtime / dl_period... I need some more time to think about
> > > this issue. 
> > >   
> > 
> > This is in dl_entity_overflow() which is called by update_dl_entity()
> > which has this:
> > 
> > 	if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
> > 	    dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
> > 		dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
> > 		dl_se->runtime = pi_se->dl_runtime;
> > 	}
> > 
> > 
> > The comments in this code state:
> > 
> >  * The policy here is that we update the deadline of the entity only
> > if:
> >  *  - the current deadline is in the past,
> >  *  - using the remaining runtime with the current deadline would make
> >  *    the entity exceed its bandwidth.
> > 
> > That second comment is saying that when this task woke up, if the
> > percentage left to run will exceed its bandwidth with the rest of the
> > system then reset its deadline and its runtime.
> 
> Right; this is the problem. When the relative deadline is different
> from the period, the term "bandwidth" is ambiguous... We can consider
> the utilisation (maximum runtime / period), or the density (maximum
> runtime / relative deadline). In some sense, the two approaches are
> both correct (if we use density, we are more pessimistic but we try to
> respect deadlines in a hard way; if we use utilisation, we allow more
> tasks to be admitted but we can only provide bounded tardiness).
> 
> What the current code is doing is to mix the two approaches (resulting
> in a wrong runtime/deadline assignment).
> 
> > What happens in the current logic, is that overflow() check says, when
> > the deadline is much smaller than the period, "yeah, we're going to
> > exceed our percentage!" so give us more, even though it wont exceed
> > its percentage if we compared runtime with deadline.
> > 
> > The relative-runtime / relative-period is a tiny percentage, which
> > does not reflect the percentage that the task is allowed to have
> > before the deadline is hit. The tasks bandwidth should be calculated
> > by the relative-runtime / relative-deadline, as runtime <= deadline
> > <= period, and the runtime should happen within the deadline.
> > 
> > When the task wakes up, it currently looks at how much time is left
> > absolute-deadline - t, and compares it to the amount of runtime left.
> > The percentage allowed should still be compared with the percentage
> > between relative-runtime and relative-deadline. The relative-period or
> > even absolute-period, should have no influence in this decision.
> 
> Ok, thanks; I think I can now see why this can result in a task
> consuming more than the reserved utilisation. I still need some time to
> convince me that "runtime / (deadline - t) > dl_runtime / dl_deadline"
> is the correct check to use (in this case, shouldn't we also change the
> admission test to use densities instead of utilisations?)
> 

Right, this is what I was wondering as well, as dl_overflow() currently
looks at the period. And I also have some recollection of this
discussion happening already in the past, unfortunately it was not on
the list.

That discussion started with the following patch

--->8---
>From 6cd9b6f3c2b9f144828aa09ad2a355b00a153348 Mon Sep 17 00:00:00 2001
From: Juri Lelli <juri.lelli@....com>
Date: Fri, 4 Sep 2015 15:41:42 +0100
Subject: [PATCH] sched/core: fix SCHED_DEADLINE admission control

As Documentation/sched/sched-deadline.txt says, a new task can pass
through admission control if sum(WCET_i / min{D_i, P_i}) <= 1.
However, if the user specifies both sched_period and sched_deadline,
we actually check that sum(WCET_i / P_i) <= 1; and this is a less
restrictive check w.r.t. the former.

Fix this by always using sched_deadline parameter to compute new_bw,
as we also impose that runtime <= deadline <= period (if period != 0)
and deadline != 0.

Fixes: 4df1638cfaf9 ("sched/deadline: Fix overflow to handle period==0 and deadline!=0")
Signed-off-by: Juri Lelli <juri.lelli@....com>
---
 kernel/sched/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 096b73b..56bc449 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2302,9 +2302,9 @@ static int dl_overflow(struct task_struct *p, int policy,
 {
 
 	struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
-	u64 period = attr->sched_period ?: attr->sched_deadline;
+	u64 deadline = attr->sched_deadline;
 	u64 runtime = attr->sched_runtime;
-	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
+	u64 new_bw = dl_policy(policy) ? to_ratio(deadline, runtime) : 0;
 	int cpus, err = -1;
 
 	if (new_bw == p->dl.dl_bw)
--->8---

that we then dediced not to propose since (note that these are just my
memories of the dicussion, so everything it's up for further discussion,
also in light of the problem highlighted by Daniel)

 - SCHED_DEADLINE, as the documentation says, does AC using utilization
 - it is however true that a sufficient (but not necessary) test on UP for
   D_i != P_i cases is the one of my patch above
 - we have agreed in the past that the kernel should only check that we
   don't cause "overload" in the system (which is still the case if we
   consider utilizations), not "hard schedulability"
 - also because on SMP systems "sum(WCET_i / min{D_i, P_i}) <= M"
   doesn't guarantee much more than the test base on P_i only (there not
   seem to be many/any papers around considering the D_i != P_i case on
   SMP actually)
 - basically the patch above would only matter for the UP/partitioned
   cases

Thoughts?

Thanks,

- Juri