lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e26ce6d-5567-477f-847b-445160b2f18c@joelfernandes.org>
Date: Tue, 19 Mar 2024 20:03:54 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: Daniel Bristot de Oliveira <bristot@...nel.org>,
 Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Daniel Bristot de Oliveira
 <bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com>,
 linux-kernel@...r.kernel.org, Luca Abeni <luca.abeni@...tannapisa.it>,
 Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
 Thomas Gleixner <tglx@...utronix.de>,
 Vineeth Pillai <vineeth@...byteword.org>,
 Shuah Khan <skhan@...uxfoundation.org>, Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH v5 6/7] sched/deadline: Deferrable dl server



On 11/4/2023 6:59 AM, Daniel Bristot de Oliveira wrote:
> Among the motivations for the DL servers is the real-time throttling
> mechanism. This mechanism works by throttling the rt_rq after
> running for a long period without leaving space for fair tasks.
> 
> The base dl server avoids this problem by boosting fair tasks instead
> of throttling the rt_rq. The point is that it boosts without waiting
> for potential starvation, causing some non-intuitive cases.
> 
> For example, an IRQ dispatches two tasks on an idle system, a fair
> and an RT. The DL server will be activated, running the fair task
> before the RT one. This problem can be avoided by deferring the
> dl server activation.
> 
> By setting the zerolax option, the dl_server will dispatch an
> SCHED_DEADLINE reservation with replenished runtime, but throttled.
> 
> The dl_timer will be set for (period - runtime) ns from start time.
> Thus boosting the fair rq on its 0-laxity time with respect to
> rt_rq.
> 
> If the fair scheduler has the opportunity to run while waiting
> for zerolax time, the dl server runtime will be consumed. If
> the runtime is completely consumed before the zerolax time, the
> server will be replenished while still in a throttled state. Then,
> the dl_timer will be reset to the new zerolax time
> 
> If the fair server reaches the zerolax time without consuming
> its runtime, the server will be boosted, following CBS rules
> (thus without breaking SCHED_DEADLINE).
> 
> Signed-off-by: Daniel Bristot de Oliveira <bristot@...nel.org>

Hi, Daniel,
We have one additional patch (other than the 15 I just sent). Since I have just
3 more working days for the next 3 weeks, I thought I might as well reply inline
here since it might be unnecessary to resend all 15 patches so soon just for the
one new addition below. I am replying to this patch here, because the new patch
is related (to 0-laxity).  But once I am back from holiday, I can resend it with
the set I have unless you've applied it.

So, Vineeth and me came up with a patch below to "max cap" the DL server 0-lax
time (max cap is default off keeping the regular behavior). This is needed to
guarantee bandwidth for periodic CFS runners/sleepers.

The example usecase is:

Consider DL server params 25ms / 50ms.

Consider CFS task with duty cycle of 25ms / 76ms (run 25ms sleep 51ms).

         run 25ms                    run 25ms
         _______                     _______
        |       | sleep 51          |       |  sleep 51
-|------|-------|---------|---------|-------|----------|--------|------> t
 0     25      50       101        126      151       202      227
                          \ 0-lax /                    \ 0-lax /

Here the 0-lax addition in the original v5's zero-lax patch causes lesser bandwidth.

So the task runs 50ms every 227ms, instead of 50ms every 152ms.

A simple unit test confirms the issue, and it is fixed by Vineeth's patch below:

Please take a look at the patch below (applies only to v5.15 but Vineeth is
rebase on mainline as we speak), thanks.

-----8<--------
From: Vineeth Pillai (Google) <vineeth@...byteword.org>
Subject: [PATCH] sched/deadline/dlserver: sysctl for dlserver maxdefer time

Inorder to avoid dlserver preempting RT tasks when it wakes up, dlserver
is throttled(deferred) until zero lax time. This is the farthest time
before deadline where dlserver can meet its deadline.

Zero lax time causes cfs tasks with sleep/run pattern where the cfs
tasks doesn't get the bandwidth promised by dlserver. So introduce a
sysctl for limiting the defer time of dlserver.

Suggested-by: Joel Fernandes (Google) <joel@...lfernandes.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@...byteword.org>
---
 include/linux/sched/sysctl.h | 2 ++
 kernel/sched/deadline.c      | 6 ++++++
 kernel/sysctl.c              | 7 +++++++
 3 files changed, 15 insertions(+)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 4939e6128840..a27fba6fe0ab 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -41,6 +41,8 @@ extern unsigned int sysctl_iowait_apply_ticks;
 extern unsigned int sysctl_sched_dl_period_max;
 extern unsigned int sysctl_sched_dl_period_min;
 +extern unsigned int sysctl_sched_dlserver_maxdefer_ms;
+
 #ifdef CONFIG_UCLAMP_TASK
 extern unsigned int sysctl_sched_uclamp_util_min;
 extern unsigned int sysctl_sched_uclamp_util_max;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d638cc5b45c7..69c9fd80a67d 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1071,6 +1071,11 @@ static int start_dl_timer(struct sched_dl_entity *dl_se)
 	if (dl_se->dl_defer_armed) {
 		WARN_ON_ONCE(!dl_se->dl_throttled);
 		act = ns_to_ktime(dl_se->deadline - dl_se->runtime);
+		if (sysctl_sched_dlserver_maxdefer_ms) {
+			ktime_t dlserver_maxdefer = rq_clock(rq) +
ms_to_ktime(sysctl_sched_dlserver_maxdefer_ms);
+			if (ktime_after(act, dlserver_maxdefer))
+				act = dlserver_maxdefer;
+		}
 	} else {
 		act = ns_to_ktime(dl_next_period(dl_se));
 	}
@@ -3099,6 +3104,7 @@ void __getparam_dl(struct task_struct *p, struct
sched_attr *attr)
  */
 unsigned int sysctl_sched_dl_period_max = 1 << 22; /* ~4 seconds */
 unsigned int sysctl_sched_dl_period_min = 100;     /* 100 us */
+unsigned int sysctl_sched_dlserver_maxdefer_ms = 2;
  /*
  * This function validates the new parameters of a -deadline task.
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 39f47a871fb4..027193302e7e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1842,6 +1842,13 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "sched_dlserver_maxdefer_ms",
+		.data		= &sysctl_sched_dlserver_maxdefer_ms,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 	{
 		.procname	= "sched_rr_timeslice_ms",
 		.data		= &sysctl_sched_rr_timeslice,
-- 
2.40.1



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ