linux-kernel - Re: [PATCH] sched/fair: don't push cfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 6 Jun 2019 22:11:29 +0800
From:   Xunlei Pang <xlpang@...ux.alibaba.com>
To:     bsegall@...gle.com, linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>, Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH] sched/fair: don't push cfs_bandwith slack timers forward

On 2019/6/6 AM 4:06, bsegall@...gle.com wrote:
> When a cfs_rq sleeps and returns its quota, we delay for 5ms before
> waking any throttled cfs_rqs to coalesce with other cfs_rqs going to
> sleep, as this has has to be done outside of the rq lock we hold.

two "has".

> 
> The current code waits for 5ms without any sleeps, instead of waiting
> for 5ms from the first sleep, which can delay the unthrottle more than
> we want. Switch this around so that we can't push this forward forever.
> 
> This requires an extra flag rather than using hrtimer_active, since we
> need to start a new timer if the current one is in the process of
> finishing.
> 
> Signed-off-by: Ben Segall <bsegall@...gle.com>
> ---

We've also suffered from this performance issue recently:
Reviewed-by: Xunlei Pang <xlpang@...ux.alibaba.com>

>  kernel/sched/fair.c  | 7 +++++++
>  kernel/sched/sched.h | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8213ff6e365d..2ead252cfa32 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4729,6 +4729,11 @@ static void start_cfs_slack_bandwidth(struct cfs_bandwidth *cfs_b)
>  	if (runtime_refresh_within(cfs_b, min_left))
>  		return;
>  
> +	/* don't push forwards an existing deferred unthrottle */
> +	if (cfs_b->slack_started)
> +		return;
> +	cfs_b->slack_started = true;
> +
>  	hrtimer_start(&cfs_b->slack_timer,
>  			ns_to_ktime(cfs_bandwidth_slack_period),
>  			HRTIMER_MODE_REL);
> @@ -4782,6 +4787,7 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
>  
>  	/* confirm we're still not at a refresh boundary */
>  	raw_spin_lock_irqsave(&cfs_b->lock, flags);
> +	cfs_b->slack_started = false;
>  	if (cfs_b->distribute_running) {
>  		raw_spin_unlock_irqrestore(&cfs_b->lock, flags);
>  		return;
> @@ -4920,6 +4926,7 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>  	hrtimer_init(&cfs_b->slack_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>  	cfs_b->slack_timer.function = sched_cfs_slack_timer;
>  	cfs_b->distribute_running = 0;
> +	cfs_b->slack_started = false;
>  }
>  
>  static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index efa686eeff26..60219acda94b 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -356,6 +356,7 @@ struct cfs_bandwidth {
>  	u64			throttled_time;
>  
>  	bool                    distribute_running;
> +	bool                    slack_started;
>  #endif
>  };
>  
>