linux-kernel - Re: [PATCH] sched/fair: prevent cpu burst too many periods

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm26mtlmpvox.fsf@google.com>
Date:   Mon, 29 Nov 2021 12:13:02 -0800
From:   Benjamin Segall <bsegall@...gle.com>
To:     Honglei Wang <wanghonglei@...ichuxing.com>,
        Huaixin Chang <changhuaixin@...ux.alibaba.com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        "Mel Gorman" <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        <linux-kernel@...r.kernel.org>, <jameshongleiwang@....com>
Subject: Re: [PATCH] sched/fair: prevent cpu burst too many periods

Honglei Wang <wanghonglei@...ichuxing.com> writes:

> Tasks might get more cpu than quota in persistent periods due to the
> cpu burst introduced by commit f4183717b370 ("sched/fair: Introduce the
> burstable CFS controller"). For example, one task group whose quota is
> 100ms per period and can get 100ms burst, and its avg utilization is
> around 105ms per period. Once this group gets a free period which
> leaves enough runtime, it has a chance to get computting power more
> than its quota for 10 periods or more in common bandwidth configuration
> (say, 100ms as period). It means tasks can 'steal' the bursted power to
> do daily jobs because all tasks could be scheduled out or sleep to help
> the group get free periods.
>
> I believe the purpose of cpu burst is to help handling bursty worklod.
> But if one task group can get computting power more than its quota for
> persistent periods even there is no bursty workload, it's kinda broke.
>
> This patch limits the burst to one period so that it won't break the
> quota limit for long. With this, we can give task group more cpu burst
> power to handle the real bursty workload and don't worry about the
> 'stealing'.

CC ing the burst patch author.

Whether or not burst is useful only for burst, or also for a bit of
long-term-only fairness is not entirely clear to me. Assuming we want it
only for burst, cutting off this sharply has a bit of additional
downside because it means that if a period refresh lands in the middle
of a burst then you lose the burst runtime. Permitting only two periods
in a row to make use of burst should be doable but it's yet another
piece of state added to cfs_b for this, and given typical ~100ms periods
that may be low enough odds that we don't care.

>
> Signed-off-by: Honglei Wang <wanghonglei@...ichuxing.com>
> ---
>  kernel/sched/fair.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6e476f6d9435..cc2c4567fc81 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4640,14 +4640,17 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
>  	if (unlikely(cfs_b->quota == RUNTIME_INF))
>  		return;
>  
> -	cfs_b->runtime += cfs_b->quota;
> -	runtime = cfs_b->runtime_snap - cfs_b->runtime;
> +	runtime = cfs_b->runtime_snap - cfs_b->quota - cfs_b->runtime;
> +
>  	if (runtime > 0) {
>  		cfs_b->burst_time += runtime;
>  		cfs_b->nr_burst++;
> +		cfs_b->runtime = cfs_b->quota;
> +	} else {
> +		cfs_b->runtime += cfs_b->quota;
> +		cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
>  	}
>  
> -	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
>  	cfs_b->runtime_snap = cfs_b->runtime;
>  }

If we do this, it should also be mentioned in
Documentation/scheduler/sched-bwc.rst, since the straightforward
description of burst as extra max runtime is no longer enough.