linux-kernel - Re: [PATCH v4 6/7] sched/deadline: Deferrable dl server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230905134203.GA20703@noisy.programming.kicks-ass.net>
Date:   Tue, 5 Sep 2023 15:42:03 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Daniel Bristot de Oliveira <bristot@...nel.org>
Cc:     Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
        Thomas Gleixner <tglx@...utronix.de>,
        Joel Fernandes <joel@...lfernandes.org>,
        Vineeth Pillai <vineeth@...byteword.org>,
        Shuah Khan <skhan@...uxfoundation.org>,
        Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH v4 6/7] sched/deadline: Deferrable dl server

On Thu, Aug 31, 2023 at 10:28:57PM +0200, Daniel Bristot de Oliveira wrote:
> +void dl_server_start(struct sched_dl_entity *dl_se, int defer)
>  {
> +	if (dl_se->server_state != DL_SERVER_STOPPED) {
> +		WARN_ON_ONCE(!(on_dl_rq(dl_se) || dl_se->dl_throttled));
> +		return;
> +	}
> +
> +	if (defer) {
> +		/*
> +		 * Postpone the replenishment to the (next period - the execution time)
> +		 *
> +		 * With this in place, we have two cases:
> +		 *
> +		 * On the absence of DL tasks:
> +		 *	The server will start at the replenishment time, getting
> +		 *	its runtime before now + period. This is the expected
> +		 *	throttling behavior.
> +		 *
> +		 * In the presense of DL tasks:
> +		 *	The server will be replenished, and then it will be
> +		 *	schedule according to EDF, not breaking SCHED_DEADLINE.
> +		 *
> +		 *	In the first cycle the server will be postponed at most
> +		 *	at period + period - runtime at most. But then the
> +		 *	server will receive its runtime/period.
> +		 *
> +		 *	The server will, however, run on top of any RT task, which
> +		 *	is the expected throttling behavior.
> +		 */
> +		dl_se->deadline = rq_clock(dl_se->rq) + dl_se->dl_period - dl_se->dl_runtime;

I was confused by this, but this is an instance of
replenish_dl_new_period(), where we explicitly do not preserve the
period.

> +		/* Zero the runtime */
> +		dl_se->runtime = 0;
> +		/* throttle the server */
> +		dl_se->dl_throttled = 1;

These comments are both obvious and placed so as to make everything
unreadable :/ 

> +
> +		dl_se->server_state = DL_SERVER_DEFER;
> +		start_dl_timer(dl_se);
> +		return;
> +	}
> +
>  	if (!dl_server(dl_se)) {
>  		dl_se->dl_server = 1;
>  		setup_new_dl_entity(dl_se);
>  	}
> +
> +	dl_se->server_state = DL_SERVER_RUNNING;
>  	enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
>  }


> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 580e6764a68b..b9d0f08dc8ca 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6499,9 +6499,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  	 */
>  	util_est_enqueue(&rq->cfs, p);
>  
> -	if (!rq->cfs.h_nr_running)
> -		dl_server_start(&rq->fair_server);
> -
>  	/*
>  	 * If in_iowait is set, the code below may not trigger any cpufreq
>  	 * utilization updates, so do it here explicitly with the IOWAIT flag
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index e23cc67c9467..7595110a5a3e 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1537,6 +1537,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
>  
>  	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
>  		enqueue_pushable_task(rq, p);
> +
> +	if (sched_fair_server_needed(rq))
> +		dl_server_start(&rq->fair_server, rq->fair_server_defer);
>  }
>  
>  static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
> @@ -1547,6 +1550,9 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
>  	dequeue_rt_entity(rt_se, flags);
>  
>  	dequeue_pushable_task(rq, p);
> +
> +	if (!sched_fair_server_needed(rq))
> +		dl_server_stop(&rq->fair_server);
>  }
>  
>  /*

I'm thinking this is wrong -- ISTR there definite benefits to explicit
slack time scheduling, meaning the server should also run while DL tasks
are on. Additionally, running the server when there are only fair tasks
is mostly harmless, right? So why this change?

One of the benefits was -- IIRC, that we no longer need to subtract some
random margin from the reservation limit, but there were more I think.