[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xm26mtlmpvox.fsf@google.com>
Date: Mon, 29 Nov 2021 12:13:02 -0800
From: Benjamin Segall <bsegall@...gle.com>
To: Honglei Wang <wanghonglei@...ichuxing.com>,
Huaixin Chang <changhuaixin@...ux.alibaba.com>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
"Mel Gorman" <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
<linux-kernel@...r.kernel.org>, <jameshongleiwang@....com>
Subject: Re: [PATCH] sched/fair: prevent cpu burst too many periods
Honglei Wang <wanghonglei@...ichuxing.com> writes:
> Tasks might get more cpu than quota in persistent periods due to the
> cpu burst introduced by commit f4183717b370 ("sched/fair: Introduce the
> burstable CFS controller"). For example, one task group whose quota is
> 100ms per period and can get 100ms burst, and its avg utilization is
> around 105ms per period. Once this group gets a free period which
> leaves enough runtime, it has a chance to get computting power more
> than its quota for 10 periods or more in common bandwidth configuration
> (say, 100ms as period). It means tasks can 'steal' the bursted power to
> do daily jobs because all tasks could be scheduled out or sleep to help
> the group get free periods.
>
> I believe the purpose of cpu burst is to help handling bursty worklod.
> But if one task group can get computting power more than its quota for
> persistent periods even there is no bursty workload, it's kinda broke.
>
> This patch limits the burst to one period so that it won't break the
> quota limit for long. With this, we can give task group more cpu burst
> power to handle the real bursty workload and don't worry about the
> 'stealing'.
CC ing the burst patch author.
Whether or not burst is useful only for burst, or also for a bit of
long-term-only fairness is not entirely clear to me. Assuming we want it
only for burst, cutting off this sharply has a bit of additional
downside because it means that if a period refresh lands in the middle
of a burst then you lose the burst runtime. Permitting only two periods
in a row to make use of burst should be doable but it's yet another
piece of state added to cfs_b for this, and given typical ~100ms periods
that may be low enough odds that we don't care.
>
> Signed-off-by: Honglei Wang <wanghonglei@...ichuxing.com>
> ---
> kernel/sched/fair.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6e476f6d9435..cc2c4567fc81 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4640,14 +4640,17 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
> if (unlikely(cfs_b->quota == RUNTIME_INF))
> return;
>
> - cfs_b->runtime += cfs_b->quota;
> - runtime = cfs_b->runtime_snap - cfs_b->runtime;
> + runtime = cfs_b->runtime_snap - cfs_b->quota - cfs_b->runtime;
> +
> if (runtime > 0) {
> cfs_b->burst_time += runtime;
> cfs_b->nr_burst++;
> + cfs_b->runtime = cfs_b->quota;
> + } else {
> + cfs_b->runtime += cfs_b->quota;
> + cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
> }
>
> - cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
> cfs_b->runtime_snap = cfs_b->runtime;
> }
If we do this, it should also be mentioned in
Documentation/scheduler/sched-bwc.rst, since the straightforward
description of burst as extra max runtime is no longer enough.
Powered by blists - more mailing lists