[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c057bbc383ab5525172b22a4f3480acbcac12045.camel@mediatek.com>
Date: Fri, 27 Jun 2025 11:02:37 +0800
From: Kuyo Chang <kuyo.chang@...iatek.com>
To: John Stultz <jstultz@...gle.com>
CC: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, "Mel
Gorman" <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Matthias
Brugger <matthias.bgg@...il.com>, AngeloGioacchino Del Regno
<angelogioacchino.delregno@...labora.com>, <linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>, <linux-mediatek@...ts.infradead.org>
Subject: Re: [PATCH v3 1/1] sched/deadline: Fix dl_server runtime
calculation formula
On Wed, 2025-06-25 at 22:25 -0700, John Stultz wrote:
>
>
>
> On Wed, Jun 25, 2025 at 8:08 PM Kuyo Chang <kuyo.chang@...iatek.com>
> wrote:
> > [Symptom]
> > The calculation formula for dl_server runtime is based on
> > Frequency/capacity scale-invariance.
> > This will cause excessive RT latency (expect absolute time).
>
> I appreciate your symptom/analysis/solution markers, but they are a
> little unconventional for commit messages, so you might consider
> dropping them.
>
> Further, you're explaining the cause of the problem before you
> describe what you were seeing.
>
> I might suggest rewording as:
> "In our testing with 6.12 based kernel on a big.LITTLE system, we
> were
> seeing instances of RT tasks being blocked from running on the LITTLE
> cpus for multiple seconds of time, apparently by the dl_server. This
> far exceeds the default configured 50ms per second runtime."
>
> Then you can get into the analysis where you can state as you had
> above:
> "This is due to the fair dl_server runtime calculation being scaled
> for frequency & capacity of the cpu"
>
> > [Analysis]
> > Consider the following case under a Big.LITTLE architecture:
> >
> > Assume the runtime is: 50,000,000 ns, and Frequency/capacity
> > scale-invariance defined as below:
> >
> > Frequency scale-invariance: 100
> > Capacity scale-invariance: 50
> > First by Frequency scale-invariance,
> > the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> > Then by capacity scale-invariance,
> > it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
> >
> > So it will scaled to 238,418 ns.
>
> This part is all good, but you might want to make it more clear why
> the smaller scaled "runtime" value here results in longer delays for
> RT tasks.
> ie:
>
> This smaller "accounted runtime" value is what ends up being
> subtracted against the fair-server's runtime for the current period.
> Thus after 50ms of real time, we've only accounted ~238us against the
> fair servers runtime. This 209:1 ratio in this example means that on
> the smaller cpu the fair server is allowed to continue running,
> blocking RT tasks, for over 10 seconds before it exhausts its
> supposed
> 50ms of runtime. And on other hardware configurations it can be even
> worse.
>
> > [Solution]
> > The runtime for dl_server should be fixed time
> > asis RT bandwidth control.
> > Fix the runtime calculation formula for the dl_server.
>
> I think this can be stated more simply as:
> "For the fair deadline_server, to prevent realtime tasks from being
> unexpectedly delayed, we really do want to use fixed time, and not
> scaled time for smaller capacity/frequency cpus. So remove the
> scaling
> from the fair server's accounting to fix this."
>
> > Signed-off-by: kuyo chang <kuyo.chang@...iatek.com>
> > Acked-by: Juri Lelli <juri.lelli@...hat.com>
> > Suggested-by: Peter Zijlstra <peterz@...radead.org>
> > Suggested-by: John Stultz <jstultz@...gle.com>
> > Tested-by: John Stultz <jstultz@...gle.com>
> >
> > v1:
> > https://lore.kernel.org/all/20250614020524.631521-1-kuyo.chang@mediatek.com/
> > v2:
> > https://lore.kernel.org/all/20250617155355.1479777-1-kuyo.chang@mediatek.com/
>
> Please add a "---" line above the version data so it doesn't get
> added
> to the commit history.
>
> > Change-Id: Iaaa1ec78f405586c22ba56230ef1143206daa2d0
>
> Also no Change-Ids in upstream targeted commits.
> ./scripts/checkpatch.pl should warn you about this, and its a good
> idea in general to run that before submitting.
>
> > ---
> > kernel/sched/deadline.c | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index ad45a8fea245..96a21f38fcc3 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1504,7 +1504,9 @@ static void update_curr_dl_se(struct rq *rq,
> > struct sched_dl_entity *dl_se, s64
> > if (dl_entity_is_special(dl_se))
> > return;
> >
> > - scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se,
> > delta_exec);
> > + scaled_delta_exec = delta_exec;
> > + if (!dl_server(dl_se))
> > + scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se,
> > delta_exec);
> >
> > dl_se->runtime -= scaled_delta_exec;
> >
> > @@ -1624,7 +1626,9 @@ void dl_server_update_idle_time(struct rq
> > *rq, struct task_struct *p)
> > if (delta_exec < 0)
> > return;
> >
> > - scaled_delta_exec = dl_scaled_delta_exec(rq, &rq-
> > >fair_server, delta_exec);
> > + scaled_delta_exec = delta_exec;
> > + if (!rq->fair_server.dl_server)
> > + scaled_delta_exec = dl_scaled_delta_exec(rq, &rq-
> > >fair_server, delta_exec);
>
> I still feel like these lines don't make much sense, and it seems
> like
> they could be dropped. When is (!rq->fair_server.dl_server) true?
>
> > rq->fair_server.runtime -= scaled_delta_exec;
> >
> > --
>
> Thanks so much again for finding this issue and resubmitting this
> patch! I really do want to see this land quickly as it addresses a
> pretty big issue.
Thanks for your reminder and suggestion.
Update to v4 as below link :)
https://lore.kernel.org/all/20250627022837.3331827-1-kuyo.chang@mediatek.com/
> -john
Powered by blists - more mailing lists