[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aLhMvNI1Loy3_jFT@jlelli-thinkpadt14gen4.remote.csb>
Date: Wed, 3 Sep 2025 16:12:12 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Yuri Andriaccio <yurand2000@...il.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org,
Luca Abeni <luca.abeni@...tannapisa.it>,
Yuri Andriaccio <yuri.andriaccio@...tannapisa.it>
Subject: Re: [PATCH v4] sched/deadline: Remove fair-servers from real-time
task's bandwidth accounting
Hi!
On 03/09/25 13:44, Yuri Andriaccio wrote:
> Fair-servers are currently used in place of the old RT_THROTTLING mechanism to
> prevent the starvation of SCHED_OTHER (and other lower priority) tasks when
> real-time FIFO/RR processes are trying to fully utilize the CPU. To allow the
> RT_THROTTLING mechanism, the maximum allocatable bandwidth for real-time tasks
> has been limited to 95% of the CPU-time.
>
> The RT_THROTTLING mechanism is now removed in favor of fair-servers, which are
> currently set to use, as expected, 5% of the CPU-time. Still, they share the
> same bandwidth that allows running real-time tasks, and which is still set to
> 95% of the total CPU-time. This means that by removing the RT_THROTTLING
> mechanism, the remaining bandwidth for real-time SCHED_DEADLINE tasks and other
> dl-servers (FIFO/RR are not affected) is only 90%.
>
> This patch reclaims the 5% lost CPU-time, which is definitely reserved for
> SCHED_OTHER tasks, but should not be accounted together with the other real-time
> tasks. More generally, the fair-servers' bandwidth must not be accounted with
> other real-time tasks.
>
> Updates:
> - Make the fair-servers' bandwidth not be accounted into the total allocated
> bandwidth for real-time tasks.
> - Remove the admission control test when allocating a fair-server.
> - Do not account for fair-servers in the GRUB's bandwidth reclaiming mechanism.
However, it looks like running_bw and this_bw still account for
fair-servers? I just checked with tools/sched/dl_bw_dump.py and can see
their contribution showing up.
running_bw, although, also influences schedutil decisions, which might
be something that is required, as maybe tasks can still be starved if
the cpu is running too slow? Not sure about this last point.
> - Limit the max bandwidth to (BW_UNIT - max_rt_bw) when changing the parameters
> of a fair-server, preventing overcommitment.
> - Update admission tests (in sched_dl_global_validate) when changing the
> maximum allocatable bandwidth for real-time tasks, preventing overcommitment.
> - Update admission tests (in dl_bw_manage) when offlining a CPU.
>
> Since the fair-server's bandwidth can be changed through debugfs, it has not
> been enforced that a fair-server's bandwidth must be always equal to (BW_UNIT -
> max_rt_bw), rather it must be less or equal to this value. This allows retaining
> the fair-servers' settings changed through the debugfs when changing the
> max_rt_bw.
>
> This also means that in order to increase the maximum bandwidth for real-time
> tasks, the bw of fair-servers must be first decreased through debugfs otherwise
> admission tests will fail, and vice versa, to increase the bw of fair-servers,
> the bw of real-time tasks must be reduced beforehand.
>
> This v4 version removes dl_bw_fair, as it is not needed anymore since each fair
> server's bw is now checked individually rather than globally. This is necessary
> because different fair-servers can have different runtimes. The bandwidth
> assignment is sound only if each CPU's rt-bw + fair-server-bw is less tahn or
> equal to 1, rather than computing the total and checking if it is less than or
> equal to the number of CPUs. The check on deadline tasks can be instead be done
> globally (on a root-domain basis) as dl tasks are allowed to migrate between
> cores. This new version fixes the error reported here:
> https://lore.kernel.org/all/aLa3zdmyKuRMy3bm@jlelli-thinkpadt14gen4.remote.csb/
Thanks for looking into it. It seems to be working correctly now.
Best,
Juri
Powered by blists - more mailing lists