lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 22 Jun 2023 09:44:01 -0400
From:   Phil Auld <pauld@...hat.com>
To:     linux-kernel@...r.kernel.org
Cc:     Juri Lelli <juri.lelli@...hat.com>, Ingo Molnar <mingo@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Valentin Schneider <vschneid@...hat.com>,
        Ben Segall <bsegall@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>
Subject: Re: [PATCH] Sched/fair: Block nohz tick_stop when cfs bandwidth in
 use

On Thu, Jun 22, 2023 at 09:27:51AM -0400 Phil Auld wrote:
> CFS bandwidth limits and NOHZ full don't play well together.  Tasks
> can easily run well past their quotas before a remote tick does
> accounting.  This leads to long, multi-period stalls before such
> tasks can run again. Currentlyi, when presented with these conflicting
> requirements the scheduler is favoring nohz_full and letting the tick
> be stopped. However, nohz tick stopping is already best-effort, there
> are a number of conditions that can prevent it, whereas cfs runtime
> bandwidth is expected to be enforced.
> 
> Make the scheduler favor bandwidth over stopping the tick by setting
> TICK_DEP_BIT_SCHED when the only running task is a cfs task with
> runtime limit enabled.
> 
> Add sched_feat HZ_BW (off by default) to control this behavior.

This is instead of the previous HRTICK version. The problem addressed
is causing significant issues for conainterized telco systems so I'm
trying a different approach. Maybe it will get more traction.

This leaves the sched tick running, but won't require a full
pass through schedule().  As Ben pointed out the HRTICK version
would basically fire every 5ms so depending on your HZ value it
might not have bought much uninterrupted runtime anyway. 


Thanks for taking a look. 


Cheers,
Phil

> 
> Signed-off-by: Phil Auld <pauld@...hat.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Vincent Guittot <vincent.guittot@...aro.org>
> Cc: Juri Lelli <juri.lelli@...hat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@....com>
> Cc: Valentin Schneider <vschneid@...hat.com>
> Cc: Ben Segall <bsegall@...gle.com>
> ---
>  kernel/sched/fair.c     | 33 ++++++++++++++++++++++++++++++++-
>  kernel/sched/features.h |  2 ++
>  2 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 373ff5f55884..880eadfac330 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6139,6 +6139,33 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
>  	rcu_read_unlock();
>  }
>  
> +#ifdef CONFIG_NO_HZ_FULL
> +/* called from pick_next_task_fair() */
> +static void sched_fair_update_stop_tick(struct rq *rq, struct task_struct *p)
> +{
> +	struct cfs_rq *cfs_rq = task_cfs_rq(p);
> +	int cpu = cpu_of(rq);
> +
> +	if (!sched_feat(HZ_BW) || !cfs_bandwidth_used())
> +		return;
> +
> +	if (!tick_nohz_full_cpu(cpu))
> +		return;
> +
> +	if (rq->nr_running != 1 || !sched_can_stop_tick(rq))
> +		return;
> +
> +	/*
> +	 *  We know there is only one task runnable and we've just picked it. The
> +	 *  normal enqueue path will have cleared TICK_DEP_BIT_SCHED if we will
> +	 *  be otherwise able to stop the tick. Just need to check if we are using
> +	 *  bandwidth control.
> +	 */
> +	if (cfs_rq->runtime_enabled)
> +		tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);
> +}
> +#endif
> +
>  #else /* CONFIG_CFS_BANDWIDTH */
>  
>  static inline bool cfs_bandwidth_used(void)
> @@ -6181,9 +6208,12 @@ static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
>  static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
>  static inline void update_runtime_enabled(struct rq *rq) {}
>  static inline void unthrottle_offline_cfs_rqs(struct rq *rq) {}
> -
>  #endif /* CONFIG_CFS_BANDWIDTH */
>  
> +#if !defined(CONFIG_CFS_BANDWIDTH) || !defined(CONFIG_NO_HZ_FULL)
> +static inline void sched_fair_update_stop_tick(struct rq *rq, struct task_struct *p) {}
> +#endif
> +
>  /**************************************************
>   * CFS operations on tasks:
>   */
> @@ -8097,6 +8127,7 @@ done: __maybe_unused;
>  		hrtick_start_fair(rq, p);
>  
>  	update_misfit_status(p, rq);
> +	sched_fair_update_stop_tick(rq, p);
>  
>  	return p;
>  
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index ee7f23c76bd3..6fdf1fdf6b17 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -101,3 +101,5 @@ SCHED_FEAT(LATENCY_WARN, false)
>  
>  SCHED_FEAT(ALT_PERIOD, true)
>  SCHED_FEAT(BASE_SLICE, true)
> +
> +SCHED_FEAT(HZ_BW, false)
> -- 
> 2.31.1
> 

-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ