[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANDhNCpoKUbQU590m-vBKhk96ZCKDsYWOHwP_LL2W84SPOnXww@mail.gmail.com>
Date: Fri, 13 Jun 2025 19:35:58 -0700
From: John Stultz <jstultz@...gle.com>
To: Kuyo Chang <kuyo.chang@...iatek.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
On Fri, Jun 13, 2025 at 7:05 PM Kuyo Chang <kuyo.chang@...iatek.com> wrote:
> From: kuyo chang <kuyo.chang@...iatek.com>
>
> [Symptom]
> The calculation formula for fair_server runtime is based on
> Frequency/CPU scale-invariance.
> This will cause excessive RT latency (expect absolute time).
>
> [Analysis]
> Consider the following case under a Big.LITTLE architecture:
>
> Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> FIE: 100
> CIE:50
> First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
>
> So it will scaled to 238,418 ns.
>
> [Solution]
> The runtime for fair_server should be absolute time
> asis RT bandwidth control.
> Fix the runtime calculation formula for the fair_server.
>
> Signed-off-by: kuyo chang <kuyo.chang@...iatek.com>
While I've not quite gotten my head around the details in the
dl_server code, I've been able to reproduce the problem described here
with a 6.12 based kernel.
Running cyclictest (with arguments "-t -a -p99 -m") , and a randomized
input test on an Android device, its pretty easy to trip 100ms to
*multi-second* delays of the RT prio 99 threads.
Perfetto image:
https://github.com/johnstultz-work/misc/blob/main/images/2025-06-13_cyclictest-dl-server-latency.png
Link to the actual trace:
https://ui.perfetto.dev/#!/?s=9bbb9e539ac2bbbfe3cfa954409134662a9f624a
Using this patch, so far in my testing with the same workload, the max
cyclictest latencies stick around the single digit ms range.
The part that is a little confusing to me, is that prior to the long
stall, it doesn't appear that RT tasks are actually starving
SCHED_NORMAL tasks, so I'm conceptually surprised to see the dl_server
boosting the normal tasks, especially for so long, but I admittedly
haven't looked in detail at the code and have been going off my
understanding of how it was supposed to replace rt-throttling, so I
may be missing a subtlety.
thanks
-john
Powered by blists - more mailing lists