linux-kernel - Re: [PATCH V4 1/1] sched/deadline: Fix dl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aGO9KyiJNAm7DLb0@jlelli-thinkpadt14gen4.remote.csb>
Date: Tue, 1 Jul 2025 11:49:15 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: John Stultz <jstultz@...gle.com>
Cc: Kuyo Chang <kuyo.chang@...iatek.com>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Matthias Brugger <matthias.bgg@...il.com>,
	AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH V4 1/1] sched/deadline: Fix dl_server runtime calculation
 formula

On 26/06/25 19:48, John Stultz wrote:
> On Thu, Jun 26, 2025 at 7:28 PM Kuyo Chang <kuyo.chang@...iatek.com> wrote:
> > In our testing with 6.12 based kernel on a big.LITTLE system, we were
> > seeing instances of RT tasks being blocked from running on the LITTLE
> > cpus for multiple seconds of time, apparently by the dl_server. This
> > far exceeds the default configured 50ms per second runtime.
> >
> > This is due to the fair dl_server runtime calculation being scaled
> > for frequency & capacity of the cpu.
> >
> > Consider the following case under a Big.LITTLE architecture:
> > Assume the runtime is: 50,000,000 ns, and Frequency/capacity
> > scale-invariance defined as below:
> > Frequency scale-invariance: 100
> > Capacity scale-invariance: 50
> > First by Frequency scale-invariance,
> > the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> > Then by capacity scale-invariance,
> > it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
> > So it will scaled to 238,418 ns.
> >
> > This smaller "accounted runtime" value is what ends up being
> > subtracted against the fair-server's runtime for the current period.
> > Thus after 50ms of real time, we've only accounted ~238us against the
> > fair servers runtime. This 209:1 ratio in this example means that on
> > the smaller cpu the fair server is allowed to continue running,
> > blocking RT tasks, for over 10 seconds before it exhausts its supposed
> > 50ms of runtime.  And on other hardware configurations it can be even
> > worse.
> >
> > For the fair deadline_server, to prevent realtime tasks from being
> > unexpectedly delayed, we really do want to use fixed time, and not
> > scaled time for smaller capacity/frequency cpus. So remove the scaling
> > from the fair server's accounting to fix this.
> >
> 
> Thanks again for revising the commit message, this version is easier
> (for me at least) to follow.
> 
> 
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index ad45a8fea245..96a21f38fcc3 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1624,7 +1626,9 @@ void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
> >         if (delta_exec < 0)
> >                 return;
> >
> > -       scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
> > +       scaled_delta_exec = delta_exec;
> > +       if (!rq->fair_server.dl_server)
> > +               scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
> >
> >         rq->fair_server.runtime -= scaled_delta_exec;
> 
> As I mentioned earlier, I still don't see this conditional as making a
> lot of sense, as I don't believe there is time when this function
> would be called and (!rq->fair_server.dl_server) would be true.
> And even if there were, I'm not sure it makes sense to scale the time
> interval based on the fair_server.dl_server flag.
> 
> From a separate discussion, you highlighted that it might be useful
> once we have multiple dl_server types, which may want scaled
> accounting, but I think in that case we should use an explicit flag
> instead of the dl_server bit to denote if the accounting should be
> scaled or not.
> 
> So, since your patch is a fix for a pretty bad bug, I think it should
> be focused on fixing the issue in the simplest and clearest way for
> the existing code, and not be too worried about integrating with
> future changes that haven't landed.
> 
> Then, as those future changes land, we can see how best to generalize
> the decision to scale or not scale the accounting on a dl_server.
> 
> That said, the conditional is a bit of a moot point, since I don't
> think we'll actually hit it, and I'm motivated to get the bug you are
> fixing resolved, so I wouldn't object if this went in as-is, but it
> seems like it would be much cleaner to just drop that conditional as
> you did in the original version of this patch.

I agree. It would be better to drop the conditional.

Thanks!
Juri