lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCorm58L9CqHyNORgfk=++HkCttmyCkM23Pn8rXV7cF41Q@mail.gmail.com>
Date: Thu, 26 Jun 2025 19:48:48 -0700
From: John Stultz <jstultz@...gle.com>
To: Kuyo Chang <kuyo.chang@...iatek.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, Matthias Brugger <matthias.bgg@...il.com>, 
	AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, linux-kernel@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH V4 1/1] sched/deadline: Fix dl_server runtime calculation formula

On Thu, Jun 26, 2025 at 7:28 PM Kuyo Chang <kuyo.chang@...iatek.com> wrote:
> In our testing with 6.12 based kernel on a big.LITTLE system, we were
> seeing instances of RT tasks being blocked from running on the LITTLE
> cpus for multiple seconds of time, apparently by the dl_server. This
> far exceeds the default configured 50ms per second runtime.
>
> This is due to the fair dl_server runtime calculation being scaled
> for frequency & capacity of the cpu.
>
> Consider the following case under a Big.LITTLE architecture:
> Assume the runtime is: 50,000,000 ns, and Frequency/capacity
> scale-invariance defined as below:
> Frequency scale-invariance: 100
> Capacity scale-invariance: 50
> First by Frequency scale-invariance,
> the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> Then by capacity scale-invariance,
> it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
> So it will scaled to 238,418 ns.
>
> This smaller "accounted runtime" value is what ends up being
> subtracted against the fair-server's runtime for the current period.
> Thus after 50ms of real time, we've only accounted ~238us against the
> fair servers runtime. This 209:1 ratio in this example means that on
> the smaller cpu the fair server is allowed to continue running,
> blocking RT tasks, for over 10 seconds before it exhausts its supposed
> 50ms of runtime.  And on other hardware configurations it can be even
> worse.
>
> For the fair deadline_server, to prevent realtime tasks from being
> unexpectedly delayed, we really do want to use fixed time, and not
> scaled time for smaller capacity/frequency cpus. So remove the scaling
> from the fair server's accounting to fix this.
>

Thanks again for revising the commit message, this version is easier
(for me at least) to follow.


> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index ad45a8fea245..96a21f38fcc3 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1624,7 +1626,9 @@ void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
>         if (delta_exec < 0)
>                 return;
>
> -       scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
> +       scaled_delta_exec = delta_exec;
> +       if (!rq->fair_server.dl_server)
> +               scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
>
>         rq->fair_server.runtime -= scaled_delta_exec;

As I mentioned earlier, I still don't see this conditional as making a
lot of sense, as I don't believe there is time when this function
would be called and (!rq->fair_server.dl_server) would be true.
And even if there were, I'm not sure it makes sense to scale the time
interval based on the fair_server.dl_server flag.

>From a separate discussion, you highlighted that it might be useful
once we have multiple dl_server types, which may want scaled
accounting, but I think in that case we should use an explicit flag
instead of the dl_server bit to denote if the accounting should be
scaled or not.

So, since your patch is a fix for a pretty bad bug, I think it should
be focused on fixing the issue in the simplest and clearest way for
the existing code, and not be too worried about integrating with
future changes that haven't landed.

Then, as those future changes land, we can see how best to generalize
the decision to scale or not scale the accounting on a dl_server.

That said, the conditional is a bit of a moot point, since I don't
think we'll actually hit it, and I'm motivated to get the bug you are
fixing resolved, so I wouldn't object if this went in as-is, but it
seems like it would be much cleaner to just drop that conditional as
you did in the original version of this patch.

Thanks so much again for your iterating on this change here! I really
appreciate your efforts to both find this issue as well as to get it
fixed!
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ